Getting readable results from Wikidata - json

Ok so I'm trying to get information from Wikidata about movies, take this movie for example: https://www.wikidata.org/wiki/Q24871
On the page the data is clearly displayed in a readable format, however when you trying to extract it via the API you get this: https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q24871
Here is a section from it:
"P272": [
{
"id": "q24871$4721C959-0FCF-49D4-9265-E4FAC217CB6E",
"mainsnak": {
"snaktype": "value",
"property": "P272",
"datatype": "wikibase-item",
"datavalue": {
"value": {
"entity-type": "item",
"numeric-id": 775450
},
"type": "wikibase-entityid"
}
},
"type": "statement",
"rank": "normal"
},
{
"id": "q24871$31777445-1068-4C38-9B4B-96362577C442",
"mainsnak": {
"snaktype": "value",
"property": "P272",
"datatype": "wikibase-item",
"datavalue": {
"value": {
"entity-type": "item",
"numeric-id": 3041294
},
"type": "wikibase-entityid"
}
},
"type": "statement",
"rank": "normal"
},
{
"id": "q24871$08009F7A-8E54-48C3-92D9-75DEF4CF3E8D",
"mainsnak": {
"snaktype": "value",
"property": "P272",
"datatype": "wikibase-item",
"datavalue": {
"value": {
"entity-type": "item",
"numeric-id": 646968
},
"type": "wikibase-entityid"
}
},
"type": "statement",
"rank": "normal"
},
{
"id": "q24871$CA53B5EB-1041-4701-A36E-7C348FAC984E",
"mainsnak": {
"snaktype": "value",
"property": "P272",
"datatype": "wikibase-item",
"datavalue": {
"value": {
"entity-type": "item",
"numeric-id": 434841
},
"type": "wikibase-entityid"
}
},
"type": "statement",
"rank": "normal",
"references": [
{
"hash": "50f57a3dbac4708ce4ae4a827c0afac7fcdb4a5c",
"snaks": {
"P143": [
{
"snaktype": "value",
"property": "P143",
"datatype": "wikibase-item",
"datavalue": {
"value": {
"entity-type": "item",
"numeric-id": 11920
},
"type": "wikibase-entityid"
}
}
]
},
"snaks-order": [
"P143"
]
}
]
}
],
The problem is I'm not sure how to convert sections like that into readable text. I get the API is calling a link between a class and its properties using unique IDs but I'm still stuck.
Is this actually possible at present or am I barking up the wrong tree?

What you should be looking for are the numeric-ids in each statements and add a leading Q to recover your wikidata ids, which should result to ['Q775450', 'Q3041294', 'Q646968', 'Q434841', 'Q11920']
[update: you can now directly access the Q id at mainsnak.datavalue.value.id, instead of having to build it from the numeric-id]
This can be done using wikibase-sdk (a JS lib I developed) wbk.simplify.claims function
Once you got those ids, you just need to request entities labels using the wbgetentities API:
https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q775450|Q3041294|Q646968|Q434841|Q11920&format=json&props=labels
you can even get results for only some languages, using the languages parameter: https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q775450|Q3041294|Q646968|Q434841|Q11920&format=json&props=labels&languages=en|de|fr

I see an accepted answer, but initially interpreted the question differently. Basically asking to have the same output in JSON one sees on the Wikidata item page.
SPARQL query with JSON output for above case:
https://query.wikidata.org/sparql?query=SELECT%20%3FwdLabel%20%3Fps_Label%20%3FwdpqLabel%20%3Fpq_Label%20%7B%0A%20%20VALUES%20(%3Fcompany)%20%7B(wd%3AQ24871)%7D%0A%0A%20%20%3Fcompany%20%3Fp%20%3Fstatement%20.%0A%20%20%3Fstatement%20%3Fps%20%3Fps_%20.%0A%0A%20%20%3Fwd%20wikibase%3Aclaim%20%3Fp.%0A%20%20%3Fwd%20wikibase%3AstatementProperty%20%3Fps.%0A%0A%20%20OPTIONAL%20%7B%0A%20%20%3Fstatement%20%3Fpq%20%3Fpq_%20.%0A%20%20%3Fwdpq%20wikibase%3Aqualifier%20%3Fpq%20.%0A%20%20%7D%0A%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22%20%7D%0A%7D&format=json
I use the Wikidata Query Front End to get my query straight and to check the results. Then use the </> Code button... explaining why you're seeing so much unnecessary whitespace above.
See also:
wikidata get all properties with labels and values of an item
SPARQL query service - Interfacing

Ok so I haven't found a solution to using the This is the "wbgetentities" system I have found that you can use the "parse" command to get the html structure.
https://www.wikidata.org/w/api.php?action=parse&page=Q24871
While it still going to need some processing its much easier than the previous solution.

Related

Is there any way to define a scoping mechanism in JSON Schema for Arrays of Objects?

I would like to use JSON Schema to validate my data which exists as an array of objects. In this use-case, I have a list of people and I want to make sure they possess certain properties, but these properties aren't exhaustive.
For instance, if we have a person name Bob, I want to make sure that Bob's height, ethnicity and location is set to certain values. But I don't care much about Bob's other properties like hobbies, weight, relationshipStatus.
There is one caveat and it is that there can be multiple Bobs, so I don't want to check for all Bobs. It just so happens that each person has a unique ID given to them and I want to check properties of a person by the specified id.
Here is an example of all the people that exist:
{
"people": [
{
"name": "Bob",
"id": "ei75dO",
"age": "36",
"height": "68",
"ethnicity": "american",
"location": "san francisco",
"weight": "174",
"relationshipStatus": "married",
"hobbies": ["camping", "traveling"]
},
{
"name": "Leslie",
"id": "UMZMA2",
"age": "32",
"height": "65",
"ethnicity": "american",
"location": "pawnee",
"weight": "139",
"relationshipStatus": "married",
"hobbies": ["politics", "parks"]
},
{
"name": "Kapil",
"id": "HkfmKh",
"age": "27",
"height": "71",
"ethnicity": "indian",
"location": "mumbai",
"weight": "166",
"relationshipStatus": "single",
"hobbies": ["tech", "games"]
},
{
"name": "Arnaud",
"id": "xSiIDj",
"age": "42",
"height": "70",
"ethnicity": "french",
"location": "paris",
"weight": "183",
"relationshipStatus": "married",
"hobbies": ["cooking", "reading"]
},
{
"name": "Kapil",
"id": "fDnweF",
"age": "38",
"height": "67",
"ethnicity": "indian",
"location": "new delhi",
"weight": "159",
"relationshipStatus": "married",
"hobbies": ["tech", "television"]
},
{
"name": "Gary",
"id": "ZX43NI",
"age": "29",
"height": "69",
"ethnicity": "british",
"location": "london",
"weight": "172",
"relationshipStatus": "single",
"hobbies": ["parkour", "guns"]
},
{
"name": "Jim",
"id": "uLqbVe",
"age": "26",
"height": "72",
"ethnicity": "american",
"location": "scranton",
"weight": "179",
"relationshipStatus": "single",
"hobbies": ["parkour", "guns"]
}
]
}
And here is what I specifically want to check for in each person:
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"type": "object",
"properties": {
"people": {
"type": "array",
"contains": {
"anyOf": [
{
"type": "object",
"properties": {
"id": {
"const": "ei75dO"
},
"name": {
"const": "Bob"
},
"ethnicity": {
"const": "american"
},
"location": {
"const": "los angeles"
},
"height": {
"const": "68"
}
},
"required": ["id", "name", "ethnicity", "location", "height"]
},
{
"type": "object",
"properties": {
"id": {
"const": "fDnweF"
},
"name": {
"const": "Kapil"
},
"location": {
"const": "goa"
},
"height": {
"const": "65"
}
},
"required": ["id", "name", "location", "height"]
},
{
"type": "object",
"properties": {
"id": {
"const": "xSiIDj"
},
"name": {
"const": "Arnaud"
},
"location": {
"const": "paris"
},
"relationshipStatus": {
"const": "single"
}
},
"required": ["id", "name", "location", "relationshipStatus"]
},
{
"type": "object",
"properties": {
"id": {
"const": "uLqbVe"
},
"relationshipStatus": {
"const": "married"
}
},
"required": ["id", "relationshipStatus"]
}
]
}
}
},
"required": ["people"]
}
Note that for Bob, I only want to check that his name in the records is Bob, his ethnicity is american and that his location and height are set properly.
For Kapil, notice that there are 2 of them in the record. I only want to validate the array object pertaining to Kapil with the id fDnweF.
And for Jim, I only want to make sure that his relationshipStatus is set to married.
So my question would be, is there any way in JSON Schema to say hey, when you come across and array of objects instead of running validation across each element in the data, only run it against objects that match a specific identifier. In our instance, we would say that the identifier is id. You can imagine that this identifier can be anything, for example it could have been socialSecurity# if the list of people were all from America.
The issue with the current schema is that when it tries to validate the objects, it generates a giant list of errors with no clear indication of which object failed with which value.
In an ideal scenario AJV (which I currently use) would generate errors that should look something like:
---------Bob-------------
path: people[0].location
expected: "los angeles"
// Notice how this isn't Kapil at index 2 since we provided the id which matches kapil at index 4
---------Kapil-----------
path: people[4].location
expected: "goa"
---------Kapil-----------
path: people[4].height
expected: "65"
---------Arnaud----------
path: people[3].relationshipStatus
expected: "single"
-----------Jim-----------
path: people[6].relationshipStatus
expected: "married"
Instead, currently AJV spits our errors with no clear indication of where the failure might be. If bob failed to match the expected value of location, it says that every person including bob has an invalid location, which from our perspective is incorrect.
How can I define a schema that can resolve this use-case and we can use JSON Schema to pinpoint which elements in our data aren't in compliance with what our schema states. All so that we can store these schema errors cleanly for reporting purposes and come back to these reports to see exactly which people (represented by index values of array) failed which values.
Edit:
Assume that we would also like to check relatives for Bob as well. for instance we want to create a schema to check that their relative with the given ID ALSO is set to location: "los angeles" and another for "orange county".
{
"people": [{
"name": "Bob",
"id": "ei75d0",
"relationshipStatus": "married",
"height": "68",
"relatives": [
{
"name": "Tony",
"id": "UDX5A6",
"location": "los angeles",
},
{
"name": "Lisa",
"id": "WCX4AG",
"location": "orange county",
}
]
}]
}
My question then would be, can the if/then/else be applied over to nested elements as well? I'm not having success but I'll continue trying to get it to work and will post an update here if/once I do.
How can I define a schema that can resolve this use-case and we can use JSON Schema to pinpoint which elements in our data aren't in compliance with what our schema states
It's a little fiddly, but I've gone from "this isn't possible" to "you can just about do this.
If you re-structure your schema to the following...
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"type": "object",
"properties": {
"people": {
"type": "array",
"items": {
"allOf":[
{
"if": {
"properties": {
"id": {
"const": "uLqbVe"
}
}
},
"then": {
"type": "object",
"properties": {
"id": {
"const": "uLqbVe"
},
"relationshipStatus": {
"const": "married"
}
},
"required": ["id", "relationshipStatus"]
},
"else": true
}
]
}
}
},
"required": ["people"]
}
What we're doing here is, for each item in the array, if the object has the specific ID, then do the other validation, otherwise, it's valid.
It's wrapped in an allOf so you can do the same pattern multiple times.
The caveat is that, if you don't include all the IDs, or if you don't carefully check your schema, you will get told everything is valid.
You should ideally, additionaly check that the IDs you are expecting, are actually there. (It's fine to do so in the same schema.)
You can see this mostly working if you test it on https://jsonschema.dev by removing the $schema property. (This playground is only draft-07, but none of the keywords you use need anything above draft-07 anyway.)
You can test this working on https://json-everything.net/json-schema which then gives you full validation response.
AJV by default doesn't give you all the validaiton results. There's an option to enable it but I'm not in a position to test the result myself right now.

Json schema to validate object's values against content of another object

I'm trying to create json schema for a document where field values in some object should validate against a enum defined in another object in the same document.
More specifically, in the example below, I'd like to be able to define "properties" with id and values (I should be able to define different properties in different json files).
Then "objects" should be able to refer to these properties, so that object.properties[i].id must match with id of one of the properties and object.properties[i].value must match with one of the enum values defined for that property.
{
"properties": [
{
"id": "SIZE",
"values": ["small", "medium", "big"]
},
{
"id": "MATERIAL",
"values": ["wood", "glass", "steel", "plastic"]
},
{
"id": "COLOR",
"values": ["red", "green", "blue"]
}
],
"objects": [
{
"name": "chair",
"properties": [
{
"id": "SIZE",
"value": "small"
},
{
"id": "COLOR",
"value": "red"
}
],
},
{
"name": "table",
"properties": [
{
"id": "MATERIAL",
"value": "wood"
}
]
}
]
}
I tried to create json schema to validate such structure, but got stuck with describing reference to inner fields of "property" object. I also looked into the standard and did not find a way to achieve the goal.
Is it possible to create a json schema which would validate my json files?
There is a proposal for $data reference that almost allows to do it if you change your data structure a little bit to remove one level of indirection. It's is supported in Ajv (I am the author).
So if your data were:
{
"properties": {
"SIZE": ["small", "medium", "big"],
"MATERIAL": ["wood", "glass", "steel", "plastic"],
"COLOR": ["red", "green", "blue"]
},
"objects": {
"chair": {
"SIZE": "small",
"COLOR": "red"
},
"table": {
"MATERIAL": "wood"
}
}
}
then your schema could have been:
{
"type": "object",
"properties": {
"properties": {
"type": "object",
"additionalProperties": {
"type": "array",
"items": { "type": "string" }
}
},
"objects": {
"type": "object",
"additionalProperties": {
"type": "object",
"properties": {
"SIZE": {"enum": {"$data": "3/properties/SIZE"}},
"MATERIAL": {"enum": {"$data": "3/properties/MATERIAL"}},
"COLOR": {"enum": {"$data": "3/properties/MATERIAL"}}
}
}
}
}
}
And it could be dynamically generated based on all list of possible properties.
With the data structure you have you either can use custom keywords if the validator supports them or implement some part of validation logic outside of JSON schema.

JSON schema verification

I have to create a form using JSON.
So as a first step i need to verify JSON with schema.
Here is a part of my JSON
"elements":{
"textbox":{
"name":{
"type":"text",
"name":"textbox",
"label":"Enter Your Name",
"required":false,
"disabled":false,
"maxlength":"",
"pattern":"",
"readonly":false,
"value":"",
"autocomplete":"off"
},
"school":{
"type":"text",
"name":"textbox",
"label":"F",
"required":false,
"disabled":false,
"maxlength":"",
"pattern":"",
"readonly":false,
"value":"",
"autocomplete":"off"
}
...
...
...
}
So inside "elements", it has a textbox, and one who types in the JSON can give any number of textbox field inside "textbox" for the form creation.
I need to write a JSON Schema to verify the data i.e, specifically i need to know how to do for this particular elements part. To define it as an array inside array or object..?? :( :/
Well, I do suggest that you define textbox as an array. This way, you could set different parameters for the objects in your array and then you would be able to verify the data this way.
Here is a little example of what I am talking about:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"definitions": {},
"id": "example",
"properties": {
"elements": {
"id": "/properties/elements",
"properties": {
"textbox": {
"id": "/properties/elements/properties/textbox",
"items": {
"id": "/properties/elements/properties/textbox/items",
"properties": {
"Parameter1": {
"id": "/properties/elements/properties/textbox/items/properties/Parameter1",
"type": "string"
},
"Parameter2": {
"id": "/properties/elements/properties/textbox/items/properties/Parameter2",
"type": "number"
},
"Parameter3": {
"id": "/properties/elements/properties/textbox/items/properties/Parameter3",
"type": "integer"
}
},
"type": "object"
},
"type": "array"
}
},
"type": "object"
}
},
"type": "object"
}
This way, the user can input as many textboxes he wants and you can still use the same schema to verify the JSON.

JSON Hyper-Schema: different schemas for GET and POST

I want to describe an API that has fields which allows for different ways to define values when POSTing an item, but only ever output in the field in one specific way.
For example, I might want to describe an API where an item can be created or updated like this: {"name": "Task", "due": "2014-12-31"} or like this: {"name": "Task", "due": {"$date": 1419984000000}}, but it is only ever returned from the API in the first way.
The schema for POST/PUT could therefore be:
{
"type": "object"
"properties": {
"name": {
"type": "string"
},
"due": {
"oneOf": [
{
"type": "string",
"format": "date"
},
{
"type": "object",
"properties": {
"$date": {
"type": "number"
}
},
"required": ["$date"],
"additionalProperties": false
}
]
}
}
}
Whereas the schema for access via GET would be much simpler:
{
"type": "object"
"properties": {
"name": {
"type": "string"
},
"due": {
"type": "string",
"format": "date"
}
}
}
It would be good for consumers of the API to know that they only have to account for one possible output method rather then all of them.
Is there any accepted standard approach to specify the different schemas within the context of JSON Hyper-Schema? I've thought about specifying these differences via the "links" property, but I do not know what "rel" I would define these schemas under and it seems very-non-standard.
If I understood correctly, and you want to specify one schema per operation you can do it with standard hyper-schema. Let's see and example for a post operation:
{
"description": "create an item.",
"href": "/items",
"method": "POST",
"rel": "create",
"schema": {
"$ref": "#/api/createitem"
},
"title": "Create an item"
}
The actual schema that is required is referenced in "schema" property through "$ref".
If you also wanted to describe the response types, then you could use "targetSchema" property. Be aware that this is advisory only (as it is explained in the docs)

Schema to load json data to google big query

I have a question for the project that we are doing...
I tried to extract this JSON to Google Big Query and not able to get JSON votes Object fields from the JSON input. I tried the "record" and the "string" types in the schema.
{
"votes": {
"funny": 10,
"useful": 10,
"cool": 10
},
"user_id": "OlMjqqzWZUv2-62CSqKq_A",
"review_id": "LMy8UOKOeh0b9qrz-s1fQA",
"stars": 4,
"date": "2008-07-02",
"text": "This is what this 4-star bar is all about.",
"type": "review",
"business_id": "81IjU5L-t-QQwsE38C63hQ"
}
Also i am not able to get the tables populated from this below JSON for the categories and neighborhood JSON arrays? What should my schema be for these inputs? The docs didn't help much unfortunately in this case or maybe i am not looking at the right place..
{
"business_id": "Iu-oeVzv8ZgP18NIB0UMqg",
"full_address": "3320 S Hill St\nSouth East LA\nLos Angeles, CA 90007",
"schools": [
"University of Southern California"
],
"open": true,
"categories": [
"Medical Centers",
"Health and Medical"
],
"neighborhoods": [
"South East LA"
]
}
I am able to get the regular fields, but that's about it... Any help is appreciated!
For business it seems you want schools to be a repeated field. Your schema should be:
"schema": {
"fields": [
{
"name": "business_id",
"type": "string"
}.
{
"name": "full_address",
"type": "string"
},
{
"name": "schools",
"type": "string",
"mode": "repeated"
},
{
"name": "open",
"type": "boolean"
}
]
}
For votes it seems you want record. Your schema should be:
"schema": {
"fields": [
{
"name": "name",
"type": "string"
}.
{
"name": "votes",
"type": "record",
"fields": [
{
"name": "funny",
"type": "integer",
},
{
"name": "useful",
"type": "integer"
},
{
"name": "cool",
"type": "integer"
}
]
},
]
}
Source
I was also stuck on this problem, but the issue I faced was because one has to remember to flag the mode as repeated for the records source
Also please note that these cannot have a null value source