How to combine two Wikipedia API calls into one? - mediawiki

I'm making a call to a Wikipedia API which returns a Title, Shot-Text, Image and geo-cordinates of that location. My Wikipedia API is:
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts|pageimages|coordinates&titles=Berlin&redirects=1&formatversion=2&exintro=1&explaintext=1&piprop=thumbnail&pithumbsize=400
Also I'm using another Wikipedia API which returns a list of place names according to their geo-coordinates:
https://en.wikipedia.org/w/api.php?format=json&action=query&list=geosearch&gsradius=1000&gscoord=52.5243700|13.4105300&gslimit=50&gsprop=type|dim|globe
For the second API I get a response like this:
"query": {
"geosearch": [
{
"pageid": 28782169,
"ns": 0,
"title": "1757 Berlin raid",
"lat": 52.523405,
"lon": 13.4114,
"dist": 122.4,
"primary": "",
"type": null,
"dim": 1000
},
{
"pageid": 526195,
"ns": 0,
"title": "Scheunenviertel",
"lat": 52.526111111111,
"lon": 13.41,
"dist": 196.9,
"primary": "",
"type": "landmark",
"dim": 1000
},
...
]
}
Now I want to combine these two searches in one API. I want to add information from my first API within the second API, something like as below:
"query": {
"geosearch": [
{
"pageid": 28782169,
"ns": 0,
"title": "1757 Berlin raid",
"lat": 52.523405,
"lon": 13.4114,
"dist": 122.4,
"primary": "",
"type": null,
"dim": 1000
"pages": [
{
"pageid": 28782169,
"ns": 0,
"title": "1757 Berlin raid",
"extract": "Berlin is the capital of Germany and one of the 16 states of Germany. With a population of 3.5 million people, it is the second most populous city proper and the seventh.........",
"thumbnail": {
"source": "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3b/Siegessaeule_Aussicht_10-13_img4_Tiergarten.jpg/400px-Siegessaeule_Aussicht_10-13_img4_Tiergarten.jpg",
"width": 400,
"height": 267
}
}
]
},
...
]
}
I want to know is it possible in this way?

So, if I understand right, you want with one Wikipedia API request to get a title, shot-text, image and geo-cordinates (your first API) for all the places (Wikipedia articles) that are located in a certain area by given coordinates and radius (your second API). If that correct, you can do it in this way:
main parameters: format=json&action=query
query parameters:
redirects=1
generator=geosearch (your second API: see point 3)
prop=extracts|coordinates|pageimages (your first API: see point 4, 5, and 6)
geosearch parameters (all generator parameters are prefixed with a "g"):
ggslimit=20 - your total results from the query (because the exlimit=20)
ggsradius=1000&ggscoord=52.5243700|13.4105300 - this is your entry point
parameters for extracts: exintro=1&explaintext=1&exlimit=20 (max exlimit is 20)
parameters for coordinates: coprop=type|dim|globe&colimit=20 (max colimit is 500)
parameters for pageimages: piprop=thumbnail&pithumbsize=400&pilimit=20 (max is 50)
How you see, the max colimit is 500 and the max pilimit is 50, but we can't use more than 20, because of exlimit.
Or finally, your request will be the joining of all parameters above:
https://en.wikipedia.org/w/api.php?format=json&action=query&redirects=1&generator=geosearch&prop=extracts|coordinates|pageimages&ggslimit=20&ggsradius=1000&ggscoord=52.5243700|13.4105300&exintro=1&explaintext=1&exlimit=20&coprop=type|dim|globe&colimit=20&piprop=thumbnail&pithumbsize=400&pilimit=20
And here is the response:
"query":{
"pages":{
"2511":{
"pageid":2511,
"ns":0,
"title":"Alexanderplatz",
"extract":"Alexanderplatz (pronounced [\u0294al\u025bk\u02c8sand\u0250\u02ccplats]) is a large public square and transport hub in the central Mitte district of Berlin, near the Fernsehturm. Berliners often call it simply Alex, referring to a larger neighbourhood stretching from Mollstra\u00dfe in the northeast to Spandauer Stra\u00dfe and the Rotes Rathaus in the southwest.",
"coordinates":[
{
"lat":52.52166748,
"lon":13.41333294,
"primary":"",
"type":"landmark",
"dim":"1000",
"globe":"earth"
}
],
"thumbnail":{
"source":"https://upload.wikimedia.org/wikipedia/commons/thumb/d/da/Alexanderplatz_by_the_night_-_ProtoplasmaKid.webm/400px--Alexanderplatz_by_the_night_-_ProtoplasmaKid.webm.jpg",
"width":400,
"height":225
}
},
...
},
}

Related

Is it possible to retreive more details from Facebook Graph API regarding cities for a page?

I'm developing a tool in python which exports Facebook KPIs daily.
So far so good, I managed to get it to work and it is working fine.
But now my customer requested even more details for the cities returned by the page insights API.
For example:
Facebook returns a JSON like this
{
"data": [
{
"name": "page_impressions_by_city_unique",
"period": "day",
"values": [
{
"value": {
"Arcole, Italy": 1,
"Brentino Belluno, Italy": 2,
"Bolzano, Italy": 2,
"Naples, Italy": 1,
"Padua, Italy": 1,
"Rome, Italy": 1,
...
},
"end_time": "2019-05-13T07:00:00+0000"
}
],
"title": "Daily Reach by City",
"description": "Daily: Total Page Reach by user city. (Unique Users)",
"id": "***"
}
]
}
The real problem is that in Italy there are some cities with the same name but different postal code (CAP in italian). That translates into potentials wrong attributions. So finally back to my question...
Can I get more information from these APIs such as the postal code or the Region? I could not find anything
Thank you and sorry for my bad english

How to sort a list of restaurant names by restaurant rating (possibly from Google Places or Yelp Fusion API)

I have a csv file with thousands of restaurant names and addresses that I need to sort by rating (data that is not in the csv). Is there a way to fill in the csv with this data? Possibly with Google Places API or Yelp Fusion API?
Both the Google Places API and Yelp Fusion API let you obtain a restaurant’s rating if you query with the business name and address. I’m going to explain how to do this but, first a caution about compliance. What you describe is clearly against the terms of service for both APIs. The only permitted use of their data is to display it on a publicly available website or app. Fetching and retaining it in a csv file is clearly improper. The APIs are intended for real-time query and immediate display of results for your users.
Google requires that the Places data be displayed in conjunction with a Google map or an approved "powered by Google" image. Additionally, no "pre-fetching, caching, or storage of content" is permitted. For details see https://developers.google.com/places/web-service/policies
Yelp requires attribution, basically requiring you to display the star rating and the Yelp logo with a link back to the business page on Yelp for the restaurant you have queried. See https://www.yelp.com/developers/display_requirements Furthermore, you can’t “cache, record, pre-fetch, or otherwise store any portion of the Yelp Content for a period longer than twenty-four (24) hours from receipt of the Yelp Content, or attempt or provide a means to execute any scraping or "bulk download" operations.” For full text and terms see https://www.yelp.com/developers/api_terms
With the legalese out of the way, here’s how to request a restaurant’s rating from Google Places:
https://maps.googleapis.com/maps/api/place/findplacefromtext/json?input=Applebees,234 W 42nd St,New York,NY&inputtype=textquery&fields=formatted_address,name,rating&key=YOUR_API_KEY
And, the JSON response:
{
"candidates": [
{
"formatted_address": "234 W 42nd St, New York, NY 10036, USA",
"name": "Applebee's Grill + Bar",
"rating": 3.6
}
],
"status": "OK"
}
Here is the same request for Yelp Fusion. There is no way to request just the rating. Results always contain everything in their database for the restaurant:
https://api.yelp.com/v3/businesses/search?term=applebees&location=234 W 42nd St,New York,NY&limit=1
JSON response:
{
"businesses": [
{
"id": "gytFjzBw-z5LZD-6JSMChg",
"alias": "applebees-grill-bar-new-york-3",
"name": "Applebee's Grill + Bar",
"image_url": "https://s3-media1.fl.yelpcdn.com/bphoto/CLizyj9S7pMvwGNm2dgdiQ/o.jpg",
"is_closed": false,
"url": "https://www.yelp.com/biz/applebees-grill-bar-new-york-3?adjust_creative=pnOv3Zj2REsNDMU4Z3-SLg&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=pnOv3Zj2REsNDMU4Z3-SLg",
"review_count": 444,
"categories": [
{
"alias": "tradamerican",
"title": "American (Traditional)"
},
{
"alias": "burgers",
"title": "Burgers"
},
{
"alias": "sportsbars",
"title": "Sports Bars"
}
],
"rating": 2,
"coordinates": {
"latitude": 40.756442,
"longitude": -73.988838
},
"transactions": [
"delivery",
"pickup"
],
"price": "$$",
"location": {
"address1": "234 W 42nd St",
"address2": "",
"address3": "",
"city": "New York",
"zip_code": "10036",
"country": "US",
"state": "NY",
"display_address": [
"234 W 42nd St",
"New York, NY 10036"
]
},
"phone": "+12123917414",
"display_phone": "(212) 391-7414",
"distance": 5.938732504864397
}
],
"total": 2900,
"region": {
"center": {
"longitude": -73.98880004882812,
"latitude": 40.75648701137637
}
}
}

Retrieve a JSON object from JSON array using Cloudant

I am doing an API call every 40 mins to retrieve the current status information of every car in a car fleet. And each call adds one new JSON document to a Cloudant database. Each JSON document defines the current availability status for every car across many locations in many cities. There are currently around 2200 JSON documents in the database. All JSON documents have one field called payload that contains all information; it is a large array of objects. Instead of retrieving the whole payload array of objects I would like to retrieve only the needed info with a query (so, only one or several objects of that array). However, I have difficulty drafting a query that results only in the needed data.
Below, I'll explain my problem in more detail:
When saving the JSON document to Cloudant, a timestamp is defined in the document. The _id parameter is defined to be equal to this timestamp. Below, I show a simplified version of these JSON documents:
{
"_id": "1540914946026",
"_rev": "3-c1834c8a230cf772e41bbcb9cf6b682e",
"timestamp": 1540914946026,
"datetime": "2018-10-30 15:55:46",
"payload": [
{
"cityName": "Abcoude",
"locations": [
{
"address": "asterlaan 28",
"geoPoint": {
"latitude": 52.27312,
"longitude": 4.96768
},
"cars": [
{
"mod": "BMW",
"state": "FREE"
}
]
}
],
"availableCars": 1,
"occupiedCars": 0
},
{
"cityName": "Alkmaar",
"locations": [
{
"address": "Aert de Gelderlaan 14",
"geoPoint": {
"latitude": 52.63131,
"longitude": 4.72329
},
"cars": [
{
"model": "Volswagen",
"state": "FREE"
}
]
},
{
"address": "Ardennenstraat 49",
"geoPoint": {
"latitude": 52.66721,
"longitude": 4.76046
},
"cars": [
{
"mod": "BMW",
"state": "FREE"
}
]
},
{
"address": "Beneluxplein 7",
"geoPoint": {
"latitude": 52.65356,
"longitude": 4.75817
},
"cars": [
{
"mod": "BMW",
"state": "FREE"
}
]
},
{
"address": "Dr. Schaepmankade 1",
"geoPoint": {
"latitude": 52.62595,
"longitude": 4.75122
},
"cars": [
{
"mod": "BMW",
"state": "OCCUPIED"
}
]
},
{
"address": "Kennemerstraatweg",
"geoPoint": {
"latitude": 52.62909,
"longitude": 4.74226
},
"cars": [
{
"model": "Mercedes",
"state": "FREE"
}
]
},
{
"address": "NS Station Alkmaar Noord/Parkeerterrein Noord",
"geoPoint": {
"latitude": 52.64366,
"longitude": 4.7627
},
"cars": [
{
"model": "Tesla",
"state": "FREE"
}
]
},
{
"address": "NS Station Alkmaar/Stationsweg 56",
"geoPoint": {
"latitude": 52.6371,
"longitude": 4.73935
},
"cars": [
{
"model": "Tesla",
"state": "FREE"
}
]
},
{
"address": "Oude Hoeverweg",
"geoPoint": {
"latitude": 52.63943,
"longitude": 4.72928
},
"cars": [
{
"model": "Tesla",
"state": "FREE"
}
]
},
{
"address": "Parkeerterrein Wortelsteeg",
"geoPoint": {
"latitude": 52.63048,
"longitude": 4.75487
},
"cars": [
{
"model": "Tesla",
"state": "OCCUPIED"
}
]
},
{
"address": "Schoklandstraat 38",
"geoPoint": {
"latitude": 52.65812,
"longitude": 4.75359
},
"cars": [
{
"model": "Volkswagen",
"state": "FREE"
}
]
}
],
"availableCars": 8,
"occupiedCars": 2
}
]
}
As you can see, the payload field is an array that has several objects (FYI: every object in this array represents one specific city: there are 1600 cities, so 1600 nested objects inside the payload array). Furthermore, inside each of the 1600 objects mentioned, other arrays and objects are again nested inside. For all objects in the payload array, the first field is cityName.
Furthermore, there is a nested array locations (inside each of the 1600 objects of the payload array) representing all addresses in a specific city. The locations array can be of size 1 to 600, meaning 1 to 600 nested objects / addresses per city. The last two fields in all objects of the payload array are availableCars and occupiedCars.
I want query documents to see how many cars are available and occupied for a specific city during a specific time interval. To do this:
I have to specify a start timestamp (or id) and an end timestamp, resulting in only the JSON documents within this interval.
Furthermore, I will need to specify inside the JSON documents only one or more specific cities by cityName (there are 1600 cities) and then get the number of available cars availableCars and the number of occupiedCars for those cities.
For example, in this simplified example, I would like to query for the status information (availableCars & `occupiedCars) for the city of Alkmaar from 1540914946026 (epoch time) until now. I would like to get the following result:
{
"id":"1540914946026",
"cityName":"Alkmaar",
"availableCars":8,
"occupiedCars":2
}
This is just an example, in reality, I want to be able to query for other cities as well, or query for several cities together and then get for each of those cities the number of available cars availableCars and the number of occupied cars occupiedCars.
Could anyone help me to define a query and index to be able to get the above result? Can I do this with cloudant query?
Your data model does not play to Cloudant's strengths. Let each document group data that changes and is accessed together. Your items in your payload array would be much better stored as discrete documents.
If you find yourself reaching into growing arrays inside documents for subsets of data, this is a warning sign that your data model is not ideal: the document is now mutable and growing (with potential update conflicts as a result), and access becomes more cumbersome over time as Cloudant has no mechanism to only retrieve parts of a document. Moreover, Cloudant has a limit (1M) on document size, so by using your proposed model, you will likely hit that limit, too, and your application would stop working.
With that said, it is possible to create a view index that lets you emit each component of your payload, which would let you look up data per city -- but that solution is still subject to all the limitations above (document model is mutable, documents grow large etc).
Rule of thumb: small documents. Immutable model, where possible. Documents group data that either change, or are accessed as a unit.

Clarification required on Google Maps DistanceMatrixResponse

I am reading on Google Maps Distance Matrix Responses and am unable to understand how the response can have four distances when there are only two source-destination pairs. The following is from the documentation. I have use the API before but not this particular service. Please clarify. May be I am missing something basic here.
{
"origin_addresses": [ "Greenwich, Greater London, UK", "13 Great Carleton Square, Edinburgh, City of Edinburgh EH16 4, UK" ],
"destination_addresses": [ "Stockholm County, Sweden", "Dlouhá 609/2, 110 00 Praha-Staré Město, Česká republika" ],
"rows": [ {
"elements": [ {
"status": "OK",
"duration": {
"value": 70778,
"text": "19 hours 40 mins"
},
"distance": {
"value": 1887508,
"text": "1173 mi"
}
}, {
"status": "OK",
"duration": {
"value": 44476,
"text": "12 hours 21 mins"
},
"distance": {
"value": 1262780,
"text": "785 mi"
}
} ]
}, {
"elements": [ {
"status": "OK",
"duration": {
"value": 96000,
"text": "1 day 3 hours"
},
"distance": {
"value": 2566737,
"text": "1595 mi"
}
}, {
"status": "OK",
"duration": {
"value": 69698,
"text": "19 hours 22 mins"
},
"distance": {
"value": 1942009,
"text": "1207 mi"
}
} ]
} ]
The documentation states, and I quote:
The supported fields in a response are explained below.
originAddresses is an array containing the locations passed in the origins field of the Distance Matrix request. The addresses are returned as they are formatted by the geocoder.
destinationAddresses is an array containing the locations passed in the destinations field, in the format returned by the geocoder.
rows is an array of DistanceMatrixResponseRow objects, with each row corresponding to an origin.
elements are children of rows, and correspond to a pairing of the row's origin with each destination. They contain status, distance, and duration information for each origin/destination pair.
The distance, duration and duration_in_traffic fields for each element include both a value (which is always shown in meters or seconds), and a text field, which supplies a more human-readable version of the information. The distance's text value is formatted according to the unitSystem specified in the request (or in metric, if no preference was supplied).
The example you've given above shows two origins and two destinations, they are not paired. Each row response corresponds to an origin point, with each element being a route from that origin to a destination.
In the example above it is returning the distance from Greenwich to Stockholm County and the Czech Republic and then the distance from Edinburgh to Stockholm County and the Czech Republic. So the distances from point A to C and D, and then point B to C and D.
Does that clarify things a little?

What is correct way of sending list/tabular data JSON with REST?

I am working on a RESTful APIs. One of our screen shows table with Grand total.
Below are two JSON responses for returning data
First
[
{
"name": "Richard",
"bank_balance": 3000,
"assets_worth": 4000,
"total": 7000
},
{
"name": "John",
"bank_balance": 1000,
"assets_worth": 2000,
"total": 3000
},
{
"name": "Total",
"bank_balance":4000,
"assets_worth": 6000,
"total": 10000
}
]
Second
{
"rows": [
{
"name": "Richard",
"bank_balance": 3000,
"assets_worth": 4000,
"total": 7000
},
{
"name": "John",
"bank_balance": 1000,
"assets_worth": 2000,
"total": 3000
}
],
"grand_total":
{
"name": "Total",
"bank_balance":4000,
"assets_worth": 6000,
"total": 10000
}
}
Which one is more correct considering REST standard?
REST is merely an architecture style for designing networked applications. It doesn't directly answer to your question on data structuring.
Personally I would go with the first approach (just without the total row) as grand total can be trivially calculated from the row data, resulting in something like:
[
{
name: "Richard",
bank_balance: 3000,
assets_worth: 4000,
total: 7000
},
{
name: "John",
bank_balance: 1000,
assets_worth: 2000,
total: 3000
}
]
I think the important design principle here is that your API should not be opinionated about data representation. Some applications that use your API may choose to display data in tabular format, while other applications may choose some other representations. A good API is able to cater equally well different applications (and use cases).