How to get table data from Wikipedia page? - json

Is there somebody who knows how to use the Wikipedia API to get JSON or XML data out from a table on a specific Wikipedia page?
Is there maybe a different way to do this?
For example from here https://en.wikipedia.org/wiki/List_of_action_films_of_the_2010s

You can use curl (or use any other method/tool) to retrieve and/or parse a Wikipedia-URL via the public API. Here are two examples that should help you:
Retrieval of List_of_action_films_of_the_2010s:
JSON unparsed via the query action
JSON parsed via the parse action
Next, you would need to parse for and/or select the sub-elements relevant for your analysis. In this case I would assume: wikitable elements.
For reference and a detailed explanation, you can have a look at the general API page of MediaWiki and at the list of parameters on how to use the API to parse Wikipedia pages for certain data elements.

Related

Request for example server side generated JSON for HPP integration

I'm trying to use a full page redirect with a direct integration and if I'm reading the documentation correctly I believe I should be able to generate the server side JSON to pass into RealexHpp.redirect. I know the code to generate this JSON is shared in a number of languages, but is the raw JSON output shared anywhere? I ask as the language I'm writing in isn't one of the ones covered, so I'm trying to make sure I get the output format correct.
I've tried re-creating the JSON structure based on what I believe the Java code displayed should output, but I'm obviously doing something wrong as its not working, would be really useful if I had some raw JSON to compare it against to make sure I'm getting the structure right.
Many thanks,
Raw JSON examples are not available, but we do have HTML POST examples (https://developer.globalpay.com/hpp/card-payments). You can build a JSON based on these.
This is how the JSON should look like: {"MERCHANT_ID":"MerchantId","ACCOUNT":"internet","ORDER_ID":"N6qsk4kYRZihmPrTXWYS6g","AMOUNT":"1999","CURRENCY":"EUR","TIMESTAMP":"20221121100715","AUTO_SETTLE_FLAG":"1","SHIPPING_CODE":"50001|Apartment 825","SHIPPING_CO":"US","HPP_SHIPPING_STREET1":"Apartment 825","HPP_SHIPPING_STREET2":"Complex 741","HPP_SHIPPING_STREET3":"House 963","HPP_SHIPPING_CITY":"Chicago","HPP_SHIPPING_STATE":"IL","HPP_SHIPPING_POSTALCODE":"50001","HPP_SHIPPING_COUNTRY":"840","BILLING_CODE":"59|123","BILLING_CO":"GB","HPP_BILLING_STREET1":"Flat 123","HPP_BILLING_STREET2":"House 456","HPP_BILLING_STREET3":"Unit 4","HPP_BILLING_CITY":"Halifax","HPP_BILLING_POSTALCODE":"W5 9HR","HPP_BILLING_COUNTRY":"826","HPP_CUSTOMER_EMAIL":"james.mason#example.com","HPP_CUSTOMER_PHONENUMBER_MOBILE":"44|07123456789","HPP_PHONE":"44|07123456789","HPP_ADDRESS_MATCH_INDICATOR":"FALSE","HPP_VERSION":"2","SHA1HASH":"308bb8dfbbfcc67c28d602d988ab104c3b08d012"}

Is there a possibility to retrieve liquipedia page content in json?

I got this page - https://liquipedia.net/dota2/Players_(all) which I want to get in JSON format, is there any open liquipedia (which is actually wikipedia based site) api's I can use? Or maybe there are approaches to convert html to json?
There appears to be a python API that would do what your looking for at https://github.com/c00kie17/liquipediapy#dota_get_players

converting json string to texteditor

i am a real noob. The thing is, i dont know where to turn. we, as in myself and my thesis partner are trying to build (theoreticly) a platform which has an own search engine directed only at the databases of wikipedia. Now when you download articles or an article, using the wikipedias api you get a json formatted string, now the question is how do i convert that json code string into just text?
like a plain simple english?
and if you have any suggestions on search algorithms for the search engine that would also be helpfull.
(note) if you can give me an explanation otherwise just give directions on keywords to search for
You would typically use a function to convert that json to object and then take the property that is of interest to you.
pseudo code:
function getTextFromwikiArticleJson(json){
var realobject = JSON.decode(json);
return realobject.content.body;
}
JSON is just (one of) the format(s) used to transfer data over internet. It then needs to be processed (i.e. parsed) in order to extract the information you want.
A wikipedia API call might give you much more information than you actually want (so called meta-data), such as creation date, authors, etc.
You are probably interested in the text of the article, so you need to access the corresponding entry in the JSON.
Try pasting the JSON in an online parser, such as https://jsonformatter.org/json-parser.
Also, read about JSON here: https://en.wikipedia.org/wiki/JSON
Most programming languages have functions to parse JSON data (often via external libraries, such as gson for Java, for example)

Returning MarkLogic EVAL REST service output as JSON

I am working on a demo using MarkLogic to store emails exported from Outlook as XML, so that they stay searchable and accessible when I move away from Outlook.
I am using an AngularJS front-end calling either the native MarkLogic REST services of own REST services written in JAVA using Jersey.
MarkLogic SEARCH REST service works very well to get back a list of references to documents based on various search criteria, but I also want to display information stored inside the found documents.
I would like to avoid multiple REST calls and to get back only the needed information, so I am trying to use the EVAL REST service to run an xQuery.
It works well to get XML back (inside a multipart/mixed message) but I don't seem to be able to get JSON instead which would be much more convenient and is very easy with most other MarkLogic REST services.
I could use "json:transform-to-json()" in my xQuery or transform the XML to JSON in my JAVA code, but that does not look very elegant to me.
Is there a more efficient method to get where I am trying to go ?
First, json:transform-to-json seems plenty elegant to me. But of course it's not always the right answer.
I see three options you haven't mentioned.
server-side transforms - REST search supports server-side transforms which transform each document when you perform a bulk read by query. Those server-side transforms could generate any json you need.
search extract-document-data - this the simplest way to extract portions of documents. But it seems best if your documents are json to match your json response. Otherwise you get xml in your json response . . . unless you're ok with that.
custom search snippets - another very powerful way to customize what search returns
All of these options don't require the privileges that eval requires, which is a very good thing. Since eval allows execution of arbitrary code on your server, it requires special privileges and should be used with great care. Two other options before you use eval are (1) custom xquery installed in an http server, and (2) REST extensions.
The answers from Sam are what I would suggest. Specifically I would set a search option for search-extract-document-data (This is a search API option. If you are posting the request, then you can add the option in the XML you post back. If you are using GET, then you need to register the option ahead of time and call it. Relevant URLs to assist:
https://docs.marklogic.com/guide/rest-dev/search#id_48838
https://docs.marklogic.com/guide/search-dev/appendixa#id_44222
As for json.. ML8 will transform content. Use the accept-header or just add format=json to your results...
Example - xml which is what my content is stored as:
http://localhost:8000/v1/search?q=watermellon
...
<search:result index="1" uri="/sample/collections/1.xml" path="fn:doc("/sample/collections/1.xml")" score="34816" confidence="0.5982239" fitness="0.6966695" href="/v1/documents?uri=%2Fsample%2Fcollections%2F1.xml" mimetype="application/xml" format="xml">
<search:snippet>
<search:match path="fn:doc("/sample/collections/1.xml")/x">
<search:highlight>watermellon</search:highlight>
</search:match>
</search:snippet>
</search:result>
...
Example - json which is what my content is stored as:
http://localhost:8000/v1/search?q=watermellon&format=json
...
"index":1,
"uri":"/sample/collections/1.xml",
"path":"fn:doc(\"/sample/collections/1.xml\")",
"score":34816,
"confidence":0.5982239,
"fitness":0.6966695,
"href":"/v1/documents?uri=%2Fsample%2Fcollections%2F1.xml",
"mimetype":"application/xml",
"format":"xml",
"matches":[
{
"path":"fn:doc(\"/sample/collections/1.xml\")/x",
"match-text":[
{
"highlight":"watermellon"
}
]
}
]
}
...
For real heavy-lifting, you can use server-side transforms as in Sam's description. One note about this. Server-side transformations are not part of the search API, but part of the REST API. Just mentioning it so you have some idea of which tool you are using in each case..

Best practices to produce JSON from NotesViews or DocumentCollections

I'm working on a custom control that will be fed by JSON content and I'm trying to find the best approach to produce and consume it.
Let say the JSON could be from:
Notes View (all documents)
Notes View (subset of documents based on a category or filter)
Notes Document Collection (from database.Search or database.FTSearch)
What I have on my mind is to define some Custom Properties where I can define:
URL that produces the JSON
Object
etc.
So far I'm considering:
REST Service control from ExtLib
XAgent that produces JSON
Domino URL ?ReadViewEntries and OutputFormat=JSON
Does anyone knows if the JSON object loaded in memory has a size limit?
Any suggestion will be appreciated.
Definitely go for the REST Service control from the Extension Library, offers by far the best combination of flexibility vs performance vs development time.
Matt
What about creating the JSON in the view itself and then just read the column values? http://www.eknori.de/2011-07-23/formula-magic/
If you want to parse the json object using ssjs, you can fetch it using an URLConnection and put the resulting object into a repeat control using the eval statement.