Wikipedia API: how to parse content text into JSON? - json

EDIT
Not sure what to do because I realized the question I originally asked was irrelevant to what I really wanted, because I thought the descriptionurl and shortdescriptionurl from a Wikipedia API query of an image file would return text that described the image, but really they're just descriptions of the URL, so I feel dumb about that.
I tried to delete the question but it wouldn't let me, because there's already an answer.
So I'm going to change the question to what I really want to know, but now the answer that already exists will not make any sense, so this is kind of a mess but I don't know what to do about it.
What I actually wanted to know
When I do this:
https://en.wikipedia.org/w/api.php?action=query&pageids=18306940&prop=revisions&formatversion=2&rvprop=content
I get this:
{
"batchcomplete": true,
"query": {
"pages": [
{
"pageid": 18306940,
"ns": 6,
"title": "File:Rot-Weiss Essen Fans, May 2008.jpg",
"revisions": [
{
"contentformat": "text/x-wiki",
"contentmodel": "wikitext",
"content": "== Summary ==\n{{Information\n|Description=Fans of Rot-Weiss Essen are celebrating a 1-0 away victory against 1. FC Magdeburg in the 2007/08 Regionalliga Nord.\n|Source=I created this work entirely by myself.\n|Date=May 24, 2008\n|Author=[[User:Povldr|Povldr]] ([[User talk:Povldr|talk]])\n|other_versions=\n}}\n== Licensing: ==\n{{self|cc-by-sa-3.0|GFDL}}\n\n{{Copy to Wikimedia Commons|bot=Fbot|priority=true}}"
}
]
}
]
}
}
What I'd like to do is have the query return only these parts of the content:
Fans of Rot-Weiss Essen are celebrating a 1-0 away victory against 1. FC Magdeburg in the 2007/08 Regionalliga Nord. (the Description)
May 24, 2008 (the Date)
Poldvr (the Author)
I could just get all that out of the content string by chopping up the string in C#, but is there any way to get it spit back to me formatted as nice little JSON in the first place?
I haven't been able to figure this out from The Wikipedia API page on the parse action, nor from the Wikipedia API Sandbox.
Can it be done?
Here is the old question, which was asking the wrong thing
title was: Wikipedia API: how do I use descriptionurl and shortdescriptionurl?
When I do this, for example:
https://en.wikipedia.org/w/api.php?action=query&list=allimages&aiprop=url&date&format=json&ailimit=1&aifrom=rot
...one of the pieces of JSON info is called "descriptionurl," and another is "shortdescriptionurl."
When I type those urls into a browser, it just takes me to the image's entire page.
How do I use those urls to get just the text of the actual description and short description?
Oh, and before you just type the link to the Wikipedia API, I have been trying to find out this information on there and failing. It's full of general information but I can't find this specific thing.

When I put your URL in a browser, I get some nice JSON as expected:
{
"warnings": {
"main": {
"*": "Unrecognized parameter: date."
}
},
"batchcomplete": "",
"continue": {
"aicontinue": "Rot-Weiss_Essen_logo.svg",
"continue": "-||"
},
"query": {
"allimages": [{
"name": "Rot-Weiss_Essen_Fans,_May_2008.jpg",
"url": "https://upload.wikimedia.org/wikipedia/en/5/5c/Rot-Weiss_Essen_Fans%2C_May_2008.jpg",
"descriptionurl": "https://en.wikipedia.org/wiki/File:Rot-Weiss_Essen_Fans,_May_2008.jpg",
"descriptionshorturl": "https://en.wikipedia.org/w/index.php?curid=18306940",
"ns": 6,
"title": "File:Rot-Weiss Essen Fans, May 2008.jpg"
}]
}
}
To extract an individual entry, you'll need to parse the JSON with your programming language of choice.

Related

HATEOS with HAL and links to embedded ressources

I think the answer to this question is great because it explains a lot about HAL: How to handle nested resources with JSON HAL?
However it does not fully answer the question (at least for me). Assuming we have a /employees resource that returns a list of all employees. I want the employees embedded but just with some basic information (not the full employee). This is OK according to the above answer and the spec. But how would my link look like?
So what would _links look like? Lets simplify the example. Assume there is no paging:
GET /employees
{
"_links": {
"self": { "href": "/employees" },
"employees" { "href": "/employees/{id}", "templated": "true" }
},
"_embedded": {
"employees": [{
"id": "1",
"fullname": "bla bli",
"_links": { ... }
},
{
"id": "2",
"fullname": "djsjsdj",
"_links": { ... }
}]
}
}
Does the templated "emloyees" URL make sense or would this be a case where you would not use any entry in _links? And if the URL is OK: is it necessary that the template parameter (here "id" does match the attribute in the embedded employee objects?
My heuristic is to consider the analogs in HTML - if it's OK for a web page, then it will also be OK for HAL.
"employees" { "href": "/employees/{id}", "templated": "true" }
What's the HTML analog? It's a form with a GET action. Can we have a form with a get action on a web page that also has digests of the information that will be reached via the form? Of course. So it must be fine here.
is it necessary that the template parameter (here "id") does match the attribute in the embedded employee objects?
I don't think it's necessary (the machines don't really care), but it's going to make life easier for the humans, and that alone has value.
Imagine, if you will, reading the documentation of a schema, and discovering that the same semantic concept (an identifier for an employee) has two different names with unrelated spellings. I would guess that would (a) introduce avoidable errors in the documentation when authors get confused about which spelling context they are in and (b) that's the sort of inconsistency that would make me suspicious of the quality of the specification as a whole.
But it's not impossible to have tradeoffs, and other benefits that outweigh these liabilities.

Getting Wikipedia infoxbox data from Wikidata

I am trying to get Wikipedia infoxbox data from Wikidata's API, for a number of companies. For example, Deliveroo:
https://www.wikidata.org/w/api.php?action=wbgetentities&format=jsonfm&sites=enwiki&titles=Deliveroo&props=info%7Clabels%7Cdescriptions%7Cclaims&languages=en
The JSON the API returns (actually JSON embedded in HTML in this case - use format=jsonfm for pure JSON) is missing some data from the Wikipedia page like "Industry: Online food ordering, Food delivery". Is there any way to find this data with Wikidata? Also, the data that is returned uses codes in place of attribute names, for example, for the "Founded" attribute in the Wikipedia infobox, Wikidata has:
mainsnak": {
"snaktype": "value",
"property": "P571",
"hash": "7f617d23c9e1f8b6ce23c06baf4d3bdad9b4fbb9",
"datavalue": {
"value": {
"time": "+2013-00-00T00:00:00Z",
"timezone": 0,
"before": 0,
"after": 0,
"precision": 9,
"calendarmodel": "http://www.wikidata.org/entity/Q1985727"
},
"type": "time"
},
"datatype": "time"
},
I am guessing that "property": "P571", refers to the founded attribute, but I am not sure how to map these codes the the actual text names. Any help would be greatly appreciated.
Wikidata is not guaranteed to contain all data Wikipedia infoboxes do. Many Wikipedia communities decided to cosume Wikidata in their infoboxes, but not all of them (notably, the English Wikipedia is known for not using Wikidata data). Even Wikipedias which do use data from Wikidata, they don't need to use all the data, and they can still decide to fill some of the data manually.
If you want to use only data from the infoboxes, perhaps https://dbpedia.org is a better option?

How to use the type "expandable" correctly?

Question
How to use the type "expandable" correctly?
Description
I have a batch that runs every hour and sends some stats to our slack. Each JSON output can be quite large and I'm looking for a way to have it collapsable/expandable.
I was playing with slack's BlockKit Builder in hopes that there was something of the sort, and I came across (when looking at the message errors) that there is a type called expandable.
However, there is no (?) documentation regarding it. The only thing that I know is.
It requires a blocks property
It should be a child of a blocks property
What I've tried
I went on the block builder (demo) and was able to get this to not throw any errors but there was no visual output...
{
"blocks": [
{
"type": "expandable",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "Hello, Assistant to the Regional Manager Dwight! *Michael Scott* wants to know where you'd like to take the Paper Company investors to dinner tonight.\n\n *Please select a restaurant:*"
}
}
]
}
]
}
What I would like:
I would like to have a working collapsable/expandable and understand what the different properties are and how they work.

How to get value of "extract" key from a Wikipedia JSON response

I want read the value of "extract" key in a Wikipedia JSON response in Python3. The URL I'm testing with is https://en.wikipedia.org/w/api.php?action=query&titles=San%20Francisco&prop=extracts&format=json.
The response looks like this
{
"batchcomplete": "",
"query": {
"pages": {
"49728": {
"pageid": 49728,
"ns": 0,
"title": "San Francisco",
"extract": "<p><b>San Francisco</b></p>"
}
}
}
}
I removed the content as it was a lot.
Now the problem is how do I read the page number programmatically. The page number changes with different searches. I definitely don't want to hard code the page number. What do I put instead of page number
content = response.query.pages.<page number>.extract
Is there any way to extract the key from the pages tag and then proceed to get it's value?
One possible way to do this using the .keys() method
page_number = list(json["query"]["pages"].keys())[0]
I found that I can solve this problem by using Python's .keys() method. I did this.
key = list(response['query']['pages'].keys())
print(response['query']['pages'][key[0]]['extract'])

Parsing JSON data with html tags in javascript

I am making a call to an API that returns something similar to this:
{
node: {
16981: {
type: "story",
title: "The day I found a dollar",
body: "<p>Today i was going to the mall when I found a dollar</p><p>This wasn&39t just any dollar it was a magical dollar</p>
}
17005: {
type: "story",
title: "My Big Thanksgiving",
body: "<p>Thanksgiving has always been one of my favorite hollidays</p><p>My favorite part of thanks giving is eating all of the food. I really like food</p><p>This year I'm going to eat only turkey</p>
}
}
I can access the data no problem, however when I attempt to add the body of the story to the page it still has the p-tags and the weird '&#039 which i'm confident is a apostrophe. I have played around with JSON.stringify, however if I stringify the entire response, it is extremely difficult to parse. (As I have modified much of the data to keep it brief) Additionally if stringify just the body, it returns the same string. Thanks in advance. I will be around to answer questions.
If we assumed the above actually was valid JSON
problems with not quoted attributes / key names
missing end quotes
missing , and ending }
malformed html entity &39
so it looked like this :
{
"node": {
"16981": {
"type": "story",
"title": "The day I found a dollar",
"body": "<p>Today i was going to the mall when I found a dollar</p><p>This wasn&39t just any dollar it was a magical dollar</p>"
},
"17005": {
"type": "story",
"title": "My Big Thanksgiving",
"body": "<p>Thanksgiving has always been one of my favorite hollidays</p><p>My favorite part of thanks giving is eating all of the food. I really like food</p><p>This year I'm going to eat only turkey</p>"
}
}
}
Then there would be no problem at all to do for example :
document.getElementById('content').innerHTML += json.node[17005].body;
without any kind of parsing, stringifying etc -> http://jsfiddle.net/zdLbp89z/
You should use innerHtml function for it
In js you can do it like that:
document.getElementById("#yourId").innerHTML = result.node.body;