Getting Wikipedia infoxbox data from Wikidata

Getting Wikipedia infoxbox data from Wikidata - mediawiki

I am trying to get Wikipedia infoxbox data from Wikidata's API, for a number of companies. For example, Deliveroo:
https://www.wikidata.org/w/api.php?action=wbgetentities&format=jsonfm&sites=enwiki&titles=Deliveroo&props=info%7Clabels%7Cdescriptions%7Cclaims&languages=en
The JSON the API returns (actually JSON embedded in HTML in this case - use format=jsonfm for pure JSON) is missing some data from the Wikipedia page like "Industry: Online food ordering, Food delivery". Is there any way to find this data with Wikidata? Also, the data that is returned uses codes in place of attribute names, for example, for the "Founded" attribute in the Wikipedia infobox, Wikidata has:
mainsnak": {
"snaktype": "value",
"property": "P571",
"hash": "7f617d23c9e1f8b6ce23c06baf4d3bdad9b4fbb9",
"datavalue": {
"value": {
"time": "+2013-00-00T00:00:00Z",
"timezone": 0,
"before": 0,
"after": 0,
"precision": 9,
"calendarmodel": "http://www.wikidata.org/entity/Q1985727"
},
"type": "time"
},
"datatype": "time"
},
I am guessing that "property": "P571", refers to the founded attribute, but I am not sure how to map these codes the the actual text names. Any help would be greatly appreciated.

Wikidata is not guaranteed to contain all data Wikipedia infoboxes do. Many Wikipedia communities decided to cosume Wikidata in their infoboxes, but not all of them (notably, the English Wikipedia is known for not using Wikidata data). Even Wikipedias which do use data from Wikidata, they don't need to use all the data, and they can still decide to fill some of the data manually.
If you want to use only data from the infoboxes, perhaps https://dbpedia.org is a better option?

Related

How To Get Steam Under 10 $ Game Json Api

I need games under $10 on Steam. How should an inquiry be made for this?
api link here
In the link here, I can only fetch games under $10 as json, but there is no information such as game id prices.
I need these. Thank you
$steamUnder20 = Http::get("https://store.steampowered.com/search/results?maxprice=20&cc=tr&l=turkish&json=1");
$resultUnder = $steamUnder20->json();
Result :
"desc": "",
"items": [
{
"name": "Black Desert",
"logo": "https://cdn.akamai.steamstatic.com/steam/apps/582660/capsule_sm_120.jpg?t=1658233501"
},
{
"name": "Slime Rancher",
"logo": "https://cdn.akamai.steamstatic.com/steam/apps/433340/capsule_sm_120.jpg?t=1651003375"
},

Steam offers an endpoint called appdetails that allows you to fetch a whole lot information about one or multiple comma-separated game IDs specified in the appids parameter:
http://store.steampowered.com/api/appdetails?appids=1,2,3
According to the inofficial documentation by RJackson in the Team Fortess wiki you can specify the values you need in the filter parameter, otherwise the response will include all basic details: https://wiki.teamfortress.com/wiki/User:RJackson/StorefrontAPI#appdetails
In your case I suggest iterating through your response and parsing the gameID from the logo URL and using them collectively for the appdetails endpoint to fetch all the information.
Additionally, you can also use the cc parameter to specify a currency and the l parameter to localise the response. These parameters are not documented in the documentation by RJackson.

HATEOS with HAL and links to embedded ressources

I think the answer to this question is great because it explains a lot about HAL: How to handle nested resources with JSON HAL?
However it does not fully answer the question (at least for me). Assuming we have a /employees resource that returns a list of all employees. I want the employees embedded but just with some basic information (not the full employee). This is OK according to the above answer and the spec. But how would my link look like?
So what would _links look like? Lets simplify the example. Assume there is no paging:
GET /employees
{
"_links": {
"self": { "href": "/employees" },
"employees" { "href": "/employees/{id}", "templated": "true" }
},
"_embedded": {
"employees": [{
"id": "1",
"fullname": "bla bli",
"_links": { ... }
},
{
"id": "2",
"fullname": "djsjsdj",
"_links": { ... }
}]
}
}
Does the templated "emloyees" URL make sense or would this be a case where you would not use any entry in _links? And if the URL is OK: is it necessary that the template parameter (here "id" does match the attribute in the embedded employee objects?

My heuristic is to consider the analogs in HTML - if it's OK for a web page, then it will also be OK for HAL.
"employees" { "href": "/employees/{id}", "templated": "true" }
What's the HTML analog? It's a form with a GET action. Can we have a form with a get action on a web page that also has digests of the information that will be reached via the form? Of course. So it must be fine here.
is it necessary that the template parameter (here "id") does match the attribute in the embedded employee objects?
I don't think it's necessary (the machines don't really care), but it's going to make life easier for the humans, and that alone has value.
Imagine, if you will, reading the documentation of a schema, and discovering that the same semantic concept (an identifier for an employee) has two different names with unrelated spellings. I would guess that would (a) introduce avoidable errors in the documentation when authors get confused about which spelling context they are in and (b) that's the sort of inconsistency that would make me suspicious of the quality of the specification as a whole.
But it's not impossible to have tradeoffs, and other benefits that outweigh these liabilities.

Wikipedia API: how to parse content text into JSON?

EDIT
Not sure what to do because I realized the question I originally asked was irrelevant to what I really wanted, because I thought the descriptionurl and shortdescriptionurl from a Wikipedia API query of an image file would return text that described the image, but really they're just descriptions of the URL, so I feel dumb about that.
I tried to delete the question but it wouldn't let me, because there's already an answer.
So I'm going to change the question to what I really want to know, but now the answer that already exists will not make any sense, so this is kind of a mess but I don't know what to do about it.
What I actually wanted to know
When I do this:
https://en.wikipedia.org/w/api.php?action=query&pageids=18306940&prop=revisions&formatversion=2&rvprop=content
I get this:
{
"batchcomplete": true,
"query": {
"pages": [
{
"pageid": 18306940,
"ns": 6,
"title": "File:Rot-Weiss Essen Fans, May 2008.jpg",
"revisions": [
{
"contentformat": "text/x-wiki",
"contentmodel": "wikitext",
"content": "== Summary ==\n{{Information\n|Description=Fans of Rot-Weiss Essen are celebrating a 1-0 away victory against 1. FC Magdeburg in the 2007/08 Regionalliga Nord.\n|Source=I created this work entirely by myself.\n|Date=May 24, 2008\n|Author=[[User:Povldr|Povldr]] ([[User talk:Povldr|talk]])\n|other_versions=\n}}\n== Licensing: ==\n{{self|cc-by-sa-3.0|GFDL}}\n\n{{Copy to Wikimedia Commons|bot=Fbot|priority=true}}"
}
]
}
]
}
}
What I'd like to do is have the query return only these parts of the content:
Fans of Rot-Weiss Essen are celebrating a 1-0 away victory against 1. FC Magdeburg in the 2007/08 Regionalliga Nord. (the Description)
May 24, 2008 (the Date)
Poldvr (the Author)
I could just get all that out of the content string by chopping up the string in C#, but is there any way to get it spit back to me formatted as nice little JSON in the first place?
I haven't been able to figure this out from The Wikipedia API page on the parse action, nor from the Wikipedia API Sandbox.
Can it be done?
Here is the old question, which was asking the wrong thing
title was: Wikipedia API: how do I use descriptionurl and shortdescriptionurl?
When I do this, for example:
https://en.wikipedia.org/w/api.php?action=query&list=allimages&aiprop=url&date&format=json&ailimit=1&aifrom=rot
...one of the pieces of JSON info is called "descriptionurl," and another is "shortdescriptionurl."
When I type those urls into a browser, it just takes me to the image's entire page.
How do I use those urls to get just the text of the actual description and short description?
Oh, and before you just type the link to the Wikipedia API, I have been trying to find out this information on there and failing. It's full of general information but I can't find this specific thing.

When I put your URL in a browser, I get some nice JSON as expected:
{
"warnings": {
"main": {
"*": "Unrecognized parameter: date."
}
},
"batchcomplete": "",
"continue": {
"aicontinue": "Rot-Weiss_Essen_logo.svg",
"continue": "-||"
},
"query": {
"allimages": [{
"name": "Rot-Weiss_Essen_Fans,_May_2008.jpg",
"url": "https://upload.wikimedia.org/wikipedia/en/5/5c/Rot-Weiss_Essen_Fans%2C_May_2008.jpg",
"descriptionurl": "https://en.wikipedia.org/wiki/File:Rot-Weiss_Essen_Fans,_May_2008.jpg",
"descriptionshorturl": "https://en.wikipedia.org/w/index.php?curid=18306940",
"ns": 6,
"title": "File:Rot-Weiss Essen Fans, May 2008.jpg"
}]
}
}
To extract an individual entry, you'll need to parse the JSON with your programming language of choice.

Microsoft Academic API, Knowledge graph search -- ReferenceIDs always empty

I'm using the graph search method of the Microsoft Academic API to retrieve citation IDs and reference IDs for a paper. However, while retrieving citation IDs works, the reference IDs field is always empty, even for papers which should have linked references. For example, retrieving this publication through the API:
POST https://westus.api.cognitive.microsoft.com/academic/v1.0/graph/search?mode=json
Content-Type: application/json
Host: westus.api.cognitive.microsoft.com
Ocp-Apim-Subscription-Key: my-api-key
{
"path": "/paper",
"paper": {
"select": [
"OriginalTitle",
"CitationIDs",
"ReferenceIDs"
],
"type": "Paper",
"id": [2059999322]
}
}
yields this response (I shortened the CitationIDs list for the sake of legibility):
{
"Results": [
[
{
"CellID": 2059999322,
"CitationIDs": "[630584464,2053566310,2239657960,...]",
"OriginalTitle": "Biodistribution of colloidal gold nanoparticles after intravenous administration: Effect of particle size",
"ReferenceIDs": ""
}
]
]
}
One thing I've noticed is that the graph schema provided here (at the bottom of the page) doesn't match the schema shown here (some of the attributes were renamed, e.g. NormalizedPaperTitle -> NormalizedTitle), so I thought the field was perhaps renamed to something else.
What is the correct query to get reference IDs through the API?

It should be ReferencesIDs, not ReferenceIDs

Mongo DB query of complex json structure

Say I have a json structure like so:
{
"A":{
"name":"dog",
"foo":"bar",
"array":[
{"name":"one"},
{"name":"two"}
]
},
"B":{
"name":"cat",
"foo":"bar",
"array":[
{"name":"one"},
{"name":"three"}
]
}
}
I want to be able to do two things.
1: Query for any "name":* within "A.array".
2: Query for any "name":"one" within "*.array".
That is, any object within a specific document's array, and any specific object within any document's array.
I hope I have used proper terminology here, I am just starting to familiarize myself with a lot of these concepts. I have tried searching for an answer but am having trouble finding something like my case.
Thanks.
EDIT:
Since I still haven't really made progress towards this, I'll just explain what I'm trying to do: I want to use the "AllSets" dataset (after I trim it down below 16mb) available on mtgjson.com. I am having problems getting mongo to play nicely though.
In an effort to try and learn what's going on, I have downloaded one set: http://mtgjson.com/json/OGW.json.
Here is a photo of its structure laid out:
I am unable to even get mongo to return an object from within the cards array using:
"find({cards: {$elemMatch: {name:"Deceiver of Form"}}})"
"find({"cards.name":"Deceiver of Form"})"
When I run either of the commands above it just returns the entire document to me.

You could use the positional projection $ operator to limit the contents of an array. For example, if you have a single document like below:
{
"block": "Battle for Zendikar",
"booster": "...",
"translations": "...",
"cards": [
{
"name": "Deceiver of Form",
"power": "8"
},
{
"name": "Eldrazi Mimic",
"power": "2"
},
{
"name": "Kozilek, the Great Distortion",
"power": "12"
}
]
}
You can query for a card name matching "Deceiver of Form", and limit fields to return only the matching array card element(s) using:
> db.collection.find({"cards.name":"Deceiver of Form"}, {"cards.$":1})
{
"_id": ObjectId("..."),
"cards": [
{
"name": "Deceiver of Form",
"power": "8"
}
]
}
Having said the above, I think you should re-consider your data model. MongoDB is a document-oriented database. A record in MongoDB is a document, so having a single record in a database does not bring out the potential of the database i.e. similar to storing all data in a single row in a table.
You should try storing the 'cards' into a collection instead. Where each document is a single card, (depending on your use case) you could add a reference to another collection containing the deck information. i.e: block, type, releaseDate, etc. For example:
// a document in cards collection:
{
"name": "Deceiver of Form",
"power": "8",
"deck_id": 1
}
// a document in decks collection:
{
"deck_id": 1,
"releaseDate": "2016-01-22",
"type": "expansion"
}
For different types of data model designs and examples, please see Data Model Design.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Getting Wikipedia infoxbox data from Wikidata - mediawiki

Related

How To Get Steam Under 10 $ Game Json Api

HATEOS with HAL and links to embedded ressources

Wikipedia API: how to parse content text into JSON?

Microsoft Academic API, Knowledge graph search -- ReferenceIDs always empty

Mongo DB query of complex json structure

Categories

Resources