Schema to load json data to google big query - json

I have a question for the project that we are doing...
I tried to extract this JSON to Google Big Query and not able to get JSON votes Object fields from the JSON input. I tried the "record" and the "string" types in the schema.
{
"votes": {
"funny": 10,
"useful": 10,
"cool": 10
},
"user_id": "OlMjqqzWZUv2-62CSqKq_A",
"review_id": "LMy8UOKOeh0b9qrz-s1fQA",
"stars": 4,
"date": "2008-07-02",
"text": "This is what this 4-star bar is all about.",
"type": "review",
"business_id": "81IjU5L-t-QQwsE38C63hQ"
}
Also i am not able to get the tables populated from this below JSON for the categories and neighborhood JSON arrays? What should my schema be for these inputs? The docs didn't help much unfortunately in this case or maybe i am not looking at the right place..
{
"business_id": "Iu-oeVzv8ZgP18NIB0UMqg",
"full_address": "3320 S Hill St\nSouth East LA\nLos Angeles, CA 90007",
"schools": [
"University of Southern California"
],
"open": true,
"categories": [
"Medical Centers",
"Health and Medical"
],
"neighborhoods": [
"South East LA"
]
}
I am able to get the regular fields, but that's about it... Any help is appreciated!

For business it seems you want schools to be a repeated field. Your schema should be:
"schema": {
"fields": [
{
"name": "business_id",
"type": "string"
}.
{
"name": "full_address",
"type": "string"
},
{
"name": "schools",
"type": "string",
"mode": "repeated"
},
{
"name": "open",
"type": "boolean"
}
]
}
For votes it seems you want record. Your schema should be:
"schema": {
"fields": [
{
"name": "name",
"type": "string"
}.
{
"name": "votes",
"type": "record",
"fields": [
{
"name": "funny",
"type": "integer",
},
{
"name": "useful",
"type": "integer"
},
{
"name": "cool",
"type": "integer"
}
]
},
]
}
Source

I was also stuck on this problem, but the issue I faced was because one has to remember to flag the mode as repeated for the records source
Also please note that these cannot have a null value source

Related

JsonSchema definition path and subschema re-use

Lets say I have two schemas defined as follows -
ADDRESS_CLASS_SCHEMA_DEFINITION = {
"title": "Address",
"type": "object",
"properties": {
"country_code": {
"$ref": "#/definitions/CountryCode"
},
"city_code": {
"title": "City Code",
"type": "string"
},
"zipcode": {
"title": "Zipcode",
"type": "string"
},
"address_str": {
"title": "Address Str",
"type": "string"
}
},
"required": [
"country_code",
"city_code",
"zipcode"
],
"definitions": {
"CountryCode": {
"title": "CountryCode",
"description": "An enumeration.",
"enum": [
"CA",
"USA",
"UK"
],
"type": "string"
}
}
}
EMPLOYEE_CLASS_SCHEMA_DEFINITION = {
"title": "Employee",
"type": "object",
"properties": {
"id": {
"title": "Id",
"type": "integer"
},
"name": {
"title": "Name",
"type": "string"
},
"email": {
"title": "Email",
"type": "string"
},
"telephone": {
"title": "Telephone",
"type": "string"
},
"address": {
"$ref": "#/definitions/Address"
}
},
"required": [
"id",
"name",
"email"
],
"definitions": {
"Address": ADDRESS_CLASS_SCHEMA_DEFINITION
}
}
I'm trying to re-use sub-schema definitions by defining a constant and referencing them individually in definitions (for example address-schema is referenced through constant in employee-schema definition). This approach works for individual schemas, however there seems to be a json-pointer path issue for Employee schema - #/definitions/CountryCode wouldn't resolve in Employee schema. I was assuming that #/definitions/CountryCode would be a relative path on Address schema as its scope is defined on a sub-schema, but my understanding seems wrong. I can make it work by flattening out like below, however I donot want to take this route -
{
"title": "Employee",
"type": "object",
"properties": {
"id": {
"title": "Id",
"type": "integer"
},
"name": {
"title": "Name",
"type": "string"
},
"email": {
"title": "Email",
"type": "string"
},
"telephone": {
"title": "Telephone",
"type": "string"
},
"address": {
"$ref": "#/definitions/Address"
}
},
"required": [
"id",
"name",
"email"
],
"definitions": {
"CountryCode": {
"title": "CountryCode",
"description": "An enumeration.",
"enum": [
"CA",
"USA",
"UK"
],
"type": "string"
},
"Address": {
"title": "Address",
"type": "object",
"properties": {
"country_code": {
"$ref": "#/definitions/CountryCode"
},
"city_code": {
"title": "City Code",
"type": "string"
},
"zipcode": {
"title": "Zipcode",
"type": "string"
},
"address_str": {
"title": "Address Str",
"type": "string"
}
},
"required": [
"country_code",
"city_code",
"zipcode"
]
}
}
}
I'm wondering how to fix this, I've briefly looked into jsonschema-bundling and using $id but from best practices it seems like the general recommendation is to use $id when dealing with URI's alone. Would like to know about best practices and how to fix this problem, would also appreciate if someone can point me on how to use $id correctly (for example, constant based approach seems to work when I provide identifiers like $id: Address, $id: Employee). Thanks in advance.
JSON Schema implementations work in JSON land. When you combine your schemas in your example above, presumably in javascript/node.js, by the time it gets to the JSON Schema implementation for validation execution, any knowledge that there were separate schemas is lost. (It's generally not considered that this approach is the best approach.)
The EASY fix here SHOULD be just to define $id in each of the roots of your schemas. These should be a fully qualfied URI. It doesn't really matter what they are at this point. They could be https://example.com/a and https://example.com/b. Then, in the primary schema, you can do $ref: https://example.com/b.
Implementations should provide you with a way to load in your other/non-primary schemas so the $id values can be stored in an index. Using $id in your other schema with a fully qualified URI will signify a "resource boundary".
https://json-schema.hyperjump.io is the only web playground to support multiple files/schemas/"Schema Resources", so you can test this out there to confirm your expectations.
Not all implementations make it easy or even provide a means to import your other schemas, but they should.
If you have follow up questions, feel free to leave a comment, or join the JSON Schema slack server if it would be off-topic for StackOverflow.

How to a build a JSON file from a CSV using an existing JSON schema format?

I have a JSON schema and a CSV file. The CSV file has 2,511 rows and one header row (2,512 rows total). Each row has 43 columns. I was able to convert the CSV to a JSON using one of the myriad of online converters, but the result is what I believed is termed a 'flat JSON file'.
Here is the CSV header row:
F1,F2,F3.1.F1,F3.1.F2,F3.1.F3,F3.1.F4,...F3.10.F1,F3.10.F2,F3.10.F3,F3.10.F4,F4
Here is my JSON schema:
{
"$schema": "http://json-schema.org/schema#",
"$id": "./.schema.json",
"title": "",
"description": "",
"type": "object",
"properties": {
"F1": {
"description": "",
"type": "string"
},
"F2": {
"description": "",
"type": "string"
},
"F3": {
"description": "",
"type": "array",
"items": {
"description": "",
"type": "object",
"properties": {
"F3.F1": {
"description": "",
"type": "string"
},
"F3.F2": {
"description": "",
"type": "string"
},
"F3.F3": {
"description": "",
"type": "string"
},
"F3.F4": {
"description": "",
"type": "string"
}
},
"required": [
"F3.F1",
"F3.F2",
"F3.F3",
"F3.F4"
]
},
"numItems": 10,
"unique": false
},
"F4": {
"description": "",
"type": "string"
}
},
"required": [
"F1",
"F2",
"F3",
"F4"
],
"additionalProperties": false
}
From the CSV->JSON conversion, my JSON file looks like:
[
{
"F1": 2429546524130460000,
"F2": 2429519276857919500,
"F3.1.F1": 2428316170619109000,
"F3.1.F2": 0.0690932185744956,
"F3.1.F3": 2.6355498567408557,
"F3.1.F4": 0.4369495787854096,
...
"F3.10.F1": 2429415922764859400,
"F3.10.F2": 0.15328371980044203,
"F3.10.F3": 2.677944208300451,
"F3.10.F4": 0.31036472544281585,
"F4": 0.16889514829995647
},
... //repeated 2,509 times
{
"F1": 1143081876266241000,
"F2": 1143588785487818100,
"F3.1.F1": 1141377392726037800,
"F3.1.F2": 1.332366799133926,
"F3.1.F3": 0.24878185970548322,
"F3.1.F4": 1.560443994684636,
...
"F3.10.F1": "XXX",
"F3.10.F2": "XXX",
"F3.10.F3": "XXX",
"F3.10.F4": "XXX",
"F4": 2.2916768389567497
}
]
Clearly, making the necessary changes 2,511 times is impractical, so I am hoping there is a method to make the changes automatically. I can code, but I could not find any specific solutions anywhere to go from a CSV to a JSON with the JSON output matching a specific JSON schema. Preferably, I would like a solution that is not restricted to just converting this one set of data to this one specific format, i.e., a general solution that could be used with a different CSV and different JSON schema.

Is there any way to define a scoping mechanism in JSON Schema for Arrays of Objects?

I would like to use JSON Schema to validate my data which exists as an array of objects. In this use-case, I have a list of people and I want to make sure they possess certain properties, but these properties aren't exhaustive.
For instance, if we have a person name Bob, I want to make sure that Bob's height, ethnicity and location is set to certain values. But I don't care much about Bob's other properties like hobbies, weight, relationshipStatus.
There is one caveat and it is that there can be multiple Bobs, so I don't want to check for all Bobs. It just so happens that each person has a unique ID given to them and I want to check properties of a person by the specified id.
Here is an example of all the people that exist:
{
"people": [
{
"name": "Bob",
"id": "ei75dO",
"age": "36",
"height": "68",
"ethnicity": "american",
"location": "san francisco",
"weight": "174",
"relationshipStatus": "married",
"hobbies": ["camping", "traveling"]
},
{
"name": "Leslie",
"id": "UMZMA2",
"age": "32",
"height": "65",
"ethnicity": "american",
"location": "pawnee",
"weight": "139",
"relationshipStatus": "married",
"hobbies": ["politics", "parks"]
},
{
"name": "Kapil",
"id": "HkfmKh",
"age": "27",
"height": "71",
"ethnicity": "indian",
"location": "mumbai",
"weight": "166",
"relationshipStatus": "single",
"hobbies": ["tech", "games"]
},
{
"name": "Arnaud",
"id": "xSiIDj",
"age": "42",
"height": "70",
"ethnicity": "french",
"location": "paris",
"weight": "183",
"relationshipStatus": "married",
"hobbies": ["cooking", "reading"]
},
{
"name": "Kapil",
"id": "fDnweF",
"age": "38",
"height": "67",
"ethnicity": "indian",
"location": "new delhi",
"weight": "159",
"relationshipStatus": "married",
"hobbies": ["tech", "television"]
},
{
"name": "Gary",
"id": "ZX43NI",
"age": "29",
"height": "69",
"ethnicity": "british",
"location": "london",
"weight": "172",
"relationshipStatus": "single",
"hobbies": ["parkour", "guns"]
},
{
"name": "Jim",
"id": "uLqbVe",
"age": "26",
"height": "72",
"ethnicity": "american",
"location": "scranton",
"weight": "179",
"relationshipStatus": "single",
"hobbies": ["parkour", "guns"]
}
]
}
And here is what I specifically want to check for in each person:
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"type": "object",
"properties": {
"people": {
"type": "array",
"contains": {
"anyOf": [
{
"type": "object",
"properties": {
"id": {
"const": "ei75dO"
},
"name": {
"const": "Bob"
},
"ethnicity": {
"const": "american"
},
"location": {
"const": "los angeles"
},
"height": {
"const": "68"
}
},
"required": ["id", "name", "ethnicity", "location", "height"]
},
{
"type": "object",
"properties": {
"id": {
"const": "fDnweF"
},
"name": {
"const": "Kapil"
},
"location": {
"const": "goa"
},
"height": {
"const": "65"
}
},
"required": ["id", "name", "location", "height"]
},
{
"type": "object",
"properties": {
"id": {
"const": "xSiIDj"
},
"name": {
"const": "Arnaud"
},
"location": {
"const": "paris"
},
"relationshipStatus": {
"const": "single"
}
},
"required": ["id", "name", "location", "relationshipStatus"]
},
{
"type": "object",
"properties": {
"id": {
"const": "uLqbVe"
},
"relationshipStatus": {
"const": "married"
}
},
"required": ["id", "relationshipStatus"]
}
]
}
}
},
"required": ["people"]
}
Note that for Bob, I only want to check that his name in the records is Bob, his ethnicity is american and that his location and height are set properly.
For Kapil, notice that there are 2 of them in the record. I only want to validate the array object pertaining to Kapil with the id fDnweF.
And for Jim, I only want to make sure that his relationshipStatus is set to married.
So my question would be, is there any way in JSON Schema to say hey, when you come across and array of objects instead of running validation across each element in the data, only run it against objects that match a specific identifier. In our instance, we would say that the identifier is id. You can imagine that this identifier can be anything, for example it could have been socialSecurity# if the list of people were all from America.
The issue with the current schema is that when it tries to validate the objects, it generates a giant list of errors with no clear indication of which object failed with which value.
In an ideal scenario AJV (which I currently use) would generate errors that should look something like:
---------Bob-------------
path: people[0].location
expected: "los angeles"
// Notice how this isn't Kapil at index 2 since we provided the id which matches kapil at index 4
---------Kapil-----------
path: people[4].location
expected: "goa"
---------Kapil-----------
path: people[4].height
expected: "65"
---------Arnaud----------
path: people[3].relationshipStatus
expected: "single"
-----------Jim-----------
path: people[6].relationshipStatus
expected: "married"
Instead, currently AJV spits our errors with no clear indication of where the failure might be. If bob failed to match the expected value of location, it says that every person including bob has an invalid location, which from our perspective is incorrect.
How can I define a schema that can resolve this use-case and we can use JSON Schema to pinpoint which elements in our data aren't in compliance with what our schema states. All so that we can store these schema errors cleanly for reporting purposes and come back to these reports to see exactly which people (represented by index values of array) failed which values.
Edit:
Assume that we would also like to check relatives for Bob as well. for instance we want to create a schema to check that their relative with the given ID ALSO is set to location: "los angeles" and another for "orange county".
{
"people": [{
"name": "Bob",
"id": "ei75d0",
"relationshipStatus": "married",
"height": "68",
"relatives": [
{
"name": "Tony",
"id": "UDX5A6",
"location": "los angeles",
},
{
"name": "Lisa",
"id": "WCX4AG",
"location": "orange county",
}
]
}]
}
My question then would be, can the if/then/else be applied over to nested elements as well? I'm not having success but I'll continue trying to get it to work and will post an update here if/once I do.
How can I define a schema that can resolve this use-case and we can use JSON Schema to pinpoint which elements in our data aren't in compliance with what our schema states
It's a little fiddly, but I've gone from "this isn't possible" to "you can just about do this.
If you re-structure your schema to the following...
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"type": "object",
"properties": {
"people": {
"type": "array",
"items": {
"allOf":[
{
"if": {
"properties": {
"id": {
"const": "uLqbVe"
}
}
},
"then": {
"type": "object",
"properties": {
"id": {
"const": "uLqbVe"
},
"relationshipStatus": {
"const": "married"
}
},
"required": ["id", "relationshipStatus"]
},
"else": true
}
]
}
}
},
"required": ["people"]
}
What we're doing here is, for each item in the array, if the object has the specific ID, then do the other validation, otherwise, it's valid.
It's wrapped in an allOf so you can do the same pattern multiple times.
The caveat is that, if you don't include all the IDs, or if you don't carefully check your schema, you will get told everything is valid.
You should ideally, additionaly check that the IDs you are expecting, are actually there. (It's fine to do so in the same schema.)
You can see this mostly working if you test it on https://jsonschema.dev by removing the $schema property. (This playground is only draft-07, but none of the keywords you use need anything above draft-07 anyway.)
You can test this working on https://json-everything.net/json-schema which then gives you full validation response.
AJV by default doesn't give you all the validaiton results. There's an option to enable it but I'm not in a position to test the result myself right now.

Azure Cost Management API does not allow me to select columns

I tried to use the Azure Cost Management - Query Usage API to get details (certain columns) on all costs for a given subscription. The body I use for the request is
{
"type": "Usage",
"timeframe": " BillingMonthToDate ",
"dataset": {
"granularity": "Daily",
"configuration": {
"columns": [
"MeterCategory",
"CostInBillingCurrency",
"ResourceGroup"
]
}
}
But the response I get back is this:
{
"id": "xxxx",
"name": "xxxx",
"type": "Microsoft.CostManagement/query",
"location": null,
"sku": null,
"eTag": null,
"properties": {
"nextLink": null,
"columns": [
{
"name": "UsageDate",
"type": "Number"
},
{
"name": "Currency",
"type": "String"
} ],
"rows": [
[
20201101,
"EUR"
],
[
20201102,
"EUR"
],
[
20201103,
"EUR"
],
...
]
}
The JSON continues listing all the dates with the currency.
When I use the dataset.aggregation or dataset.grouping clauses in the JSON, I do get costs returned in my JSON but then I don't get the detailed column information that I want. And of course it is not possible to combine these 2 clauses with the dataset.columns clause. Anyone have any idea what I'm doing wrong?
I found a solution without using the dataset.columns clause (which might just be a faulty clause?). By grouping the data according tot the columns I want, I can also get the data for those column values:
{
"type": "Usage",
"timeframe": "BillingMonthToDate",
"dataset": {
"granularity": "Daily",
"aggregation": {
"totalCost": {
"name": "PreTaxCost",
"function": "Sum"
}
},
"grouping": [
{
"type": "Dimension",
"name": "SubscriptionName"
},
{
"type": "Dimension",
"name": "ResourceGroupName"
}
,
{
"type": "Dimension",
"name": "meterSubCategory"
}
,
{
"type": "Dimension",
"name": "MeterCategory"
}
]
}

Fiware Context Broker with entities geolocated

I have a problem in retrieving entities using georeferenced queries.
Use the v2 syntax.
This is my query:
GET /v2/entities?georel=near;maxDistance:1000&geometry=point&coords=13.52,43.61
and this is my entity:
{
"id": "p1",
"type": "pm",
"address": {
"type": "Text",
"value": "Via Roma "
},
"allowedVehicleType": {
"type": "Text",
"value": "car"
},
"category": {
"type": "Text",
"value": "onstreet"
},
"location": {
"type": "geo:json",
"value": {
"type": "Point",
"coordinates": [ 13.5094, 43.6246 ]
}
},
"name": {
"type": "Text",
"value": "p1"
},
"totalSpotNumber": {
"type": "Number",
"value": 32
}
}
What is wrong?
I followed the official documentation but I can not get any results as well.
I also tried to reverse the coordinates, but the result does not change.
Any suggestion is welcome.
Note that longitude comes before latitude in GeoJSON coordinates, while the coords parameters does in the opposite way.
Thus, assuming that your entity is located in Ancona city, I think that using "coordinates": [ 43.6246, 13.5094 ] will solve the problem.