How to get the Wikidata-related using page prop API of mediawiki - mediawiki

I saw a couple of solution to get the Wikidata Item from Wikipedia page. Generally, they use this pageprop query API:
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&redirects=1&titles=LONDON
But in my small version of media wiki, when I perform the same query, the result does not contain any wikidata id even though the page is site-linked to a wikidata item.
Is there any data/script I need to run? What might be the possible cause of this?
Additional Info:
I found out also that when I do a list of prop name, the wikibase_item is not there. Below are the same example from wikipedia which works.
https://en.wikipedia.org/w/api.php?action=query&list=pagepropnames&ppnlimit=100

If I understand your question correctly, you want to use webentityusage in your query to get the wikidata ids. So, for your particular example, with London, you would use:
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops|wbentityusage&titles=London
The entry at the bottom has all the data associated with London:
Scroll down in the JSON and you'll see:
"wbentityusage": {
...
"Q84": {
...
}
}
Q84 also shows as the "wikibase_item" under pageprops.
Then, all the data associated with london is at:
https://www.wikidata.org/wiki/Q84
(which can also be accessed via wikidata apis)
EDIT: Here's yet another example. (for wikidata-specific items, it can sometimes help to work in reverse...i.e., list all the pages that reference data Q[nnn])
The following wikipedia page uses some wikidata item(s): https://en.wikipedia.org/wiki/Template:Pageid_to_title
Specifically, it uses Earth (Q2). So, if we use the wikipedia API:
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops|wbentityusage&titles=Template:Pageid_to_title
It will show Q2 under the wbentityusage for the pageId 49086285. Not under pageprops. Pageprops will only show the wikibase_item.
It's important to note that even though the query is identical on both wikipedia and wikidata, the results will be different based on which domain you run it on.
Also helpful (work in reverse): to see what wikipedia pages reference a particular wikidata item, you would use
https://en.wikipedia.org/w/api.php?action=query&list=wblistentityusage&wbeuentities=Q2&wbeuprop=url
--> Shows all the wikipedia pages referencing Q2 (Earth)
https://wikidata.org/w/api.php?action=query&list=wblistentityusage&wbeuentities=Q2&wbeuprop=url
--> Shows all the wikidata pages referencing Q2 (Earth)
And, just for fun, I edited that wikipedia page Template:Pageid_to_title to reference Q3 (Life) and...after a little bit of patience waiting for things to sync, the API now responds with Q3 as also being one of the wikidata items associated with that page.

Related

Amazon: product advertising api pagination top sellers

Is this a limitation of the amazon API?
I would like to pull data similar to this page: amazon.com/Best-Sellers-Home-Improvement-Pumps-Plumbing-Equipment/zgbs/hi/13749581/ref=zg_bs_nav_hi_1_hi
STACKOVERFLOW BREAKS THIS LINK!
am using:
operation: 'BrowseNodeLookup',
response_group: "BrowseNodeInfo,TopSellers"
The TopSeller response group only returns 10 items and does not respond to ItemPage.
Is there a way to do item lookup without a query using a browse node and sorting by popularity?
The AWS documentation on the BrowseNodeLookup API and the TopSellers response group indicates that it only includes the top 10, and there is no mention of pagination.
The TopSellers response group returns the ASINs and titles of the 10 best sellers within a specified browse node.
However, the results from TopSellers are basically equivalent to the results of an ItemSearch with Sort set to salesrank. Therefore, you can solve pagination requirements as follows:
On initial load (such as a user loading a web page or opening a particular view in a mobile application), issue BrowseNodeLookup and retrieve TopSellers. Populate some portion of the UI with information from the browse node and some other portion of the UI with the TopSellers results.
If the user never goes past the first page, then do nothing more. (There is no need to spend time on an additional service call.)
As the user navigates to subsequent pages, issue ItemSearch with Sort set to salesrank and ItemPage set to the page number. Use these results to update the portion of the web page/view in your application that was previously populated from the browse node TopSellers.
Note that you will still only be able to retrieve up to 10 pages worth of results. This is an ItemSearch API limitation.

How to restrict fields returned by stackexchange api, and turn off paging?

I'd like to have a list of just the current titles for all questions in one of the smaller (less than 10,000 questions) stackexchange site. I tried the interactive utility here: https://api.stackexchange.com/docs/questions and it both reports the result as a json at the bottom, and produces the requesting url at the top. For example:
https://api.stackexchange.com/2.2/questions?order=desc&sort=activity&tagged=apples&site=cooking
returns this JSON in my browser:
{"items":[{"tags":["apples","crumble"],"owner":{ ...
...
...],"has_more":true,"quota_max":300,"quota_remaining":252}
What is quota? It was 10,000 on one search on one site, but suddenly it's only 300 here.
I won't be doing this very often, what I'd like is the quickest way to edit that (or similar of course) url so I can get a list of all of the titles on a small site. I don't understand how to use paging, and I don't need any of the other fields. I don't care if I get them, but I'm thinking if I exclude them I can have more at once.
If I need to script it, python (2.7) is my preferred (only) language.
quota_max is the number of requests your application is allowed per day. 300 is the default for an unregistered application. This used to be mentioned directly on the page describing throttles, but seems to have been removed. Here is historical information describing the default.
To increase this to 10,000, you need to register an application and then authenticate by passing an access token in your script.
To get all titles on a site, you can use a Python library to help:
StackAPI. The answer below will use this library. DISCLAIMER: I wrote this library
Py-StackExchange
SEAPI
StackPy
Assuming you have registered your application and authenticated we can proceed.
First, install StackAPI (documentation):
pip install stackapi
This code will then grab the 10,000 most recent questions (max_pages * page_size) for the site hardwarerecs. Each page costs you one API hit, so the more items per page, the few API calls.
from stackapi import StackAPI
SITE = StackAPI('hardwarerecs')
SITE.page_size = 100
SITE.max_pages = 100
# Filter to only get question title and link
filter = '!BHMIbze0EQ*ved8LyoO6rNjkuLgHPR'
questions = SITE.fetch('questions', filter=filter)
In the questions variable is a dictionary that looks very similar to the API output, except that the library did all the paging for you. Your data is in questions['data'] and, in this case, contains a list of dictionaries that look like this:
[
...
{u'link': u'http://hardwarerecs.stackexchange.com/questions/29/sound-board-to-replace-a-gl2200-in-a-house-of-worship-foh-setting',
u'title': u'Sound board to replace a GL2200 in a house-of-worship FOH setting?'},
{ u'link': u'http://hardwarerecs.stackexchange.com/questions/31/passive-gps-tracker-logger',
u'title': u'Passive GPS tracker/logger'}
...
]
This result set is limited to only the title and the link because of the filter we applied. You can find the appropriate filter by adjusting what fields you want in the web UI and copying the filter field.
The hardwarerecs parameter that is passed when creating the SITE parameter is the first part of the site's domain URL. Alternatively, you can find it by looking at the api_site_parameter for your site when looking at the /sites end point.

Instagram Media Endpoint Paging

I'm currently looking at reading out posts and related json data from a given number of Instagram users using the following URL:
https://www.instagram.com//media/
This will only bring back the latest 20 posts. I have done some hunting around and I am unable to see how to form the url to bring back the next 20 results. I've seen some places that have suggested using max_timestamp, but I can't see how to make this work.
For various reasons I do not wish to use the standard Instagram API.
You should use a max_id parameter to pagination.
Example: https://www.instagram.com/[user-login]/media/?max_id=[last-min-id], where [last-min-id] is a minimal id from previous page. The id does not repeat in new page.
This endpoint 'https://www.instagram.com/[user-login]/media/' is currently turned off in the last few days, unsure exactly when.
If you are dependant on it, you might want to check it now in your apps.
e.g. https://www.instagram.com/fosterandpartners/media/

How can I add "current streak" of contributions from github to my blog?

I have a personal blog I built using rails. I want to add a section to my site that displays my current streak of github contributions. What would be the best way about doing this?
edit: for clarification, here is what I want:
just the number of days is all that is necessary for me.
Considering the GitHub API for Users doesn't yet expose that particular information (number of days for current stream of contributions), you might have to:
scrape it (extract it by reading the user's GitHub page)
As klamping mentions in his answer (upvoted), the url to scrap would be:
https://github.com/users/<username>/contributions_calendar_data
https://github.com/users/<username>/contributions
(for public repos only, though)
SherlockStd has an updated (May 2017) parsing code below:
https://github-stats.com/api/user/streak/current/:username
try projects which are using https://github.com/users/<username>/contributions_calendar_data (as listed in Marques Johansson's answer, upvoted)
IonicaBizau/git-stats:
akerl/githubchart (Github contribution SVG generator)
akerl/githubstats (Github contribution statistics)
build that graph yourself: see the GitHub project git-cal
git-cal is a simple script to view commits calendar (similar to GitHub contributions calendar) on command line.
Each block in the graph corresponds to a day and is shaded with one of the 5 possible colors, each representing relative number of commits on that day.
or establish a service that will report, each day, any new commit for that given day to a Google Calendar (using the Google Calendar API through a project like nf/streak).
You can then read that information and report it in your blog.
You can find various example of scraping that information:
github_team_calendar.py
weekend-commits.js
As in:
$.getJSON('https://github.com/users/' + location.pathname.replace(/\//g, '') + '/contributions_calendar_data', weekendWork);
leaderboard.rb:
Like:
leaderboard = members.map do |u|
user_stats = get("https://github.com/users/#{u}/contributions_calendar_data")
total = user_stats.map { |s| s[1] }.reduce(&:+)
[u, total]
end
... (you get the idea)
The URL for the plain JSON data was:
https://github.com/users/[username]/contributions_calendar_data
[Edit: Looks like this URL no longer works)
There is a URL which generates the SVG, which other answers have indicated. That is here:
https://github.com/users/[username]/contributions
Simply replace [username] with your github username in the URL and you should be able to see the chart. See other answers for more in-depth explanations
If you want something that matches the visual appearance of GitHub's chart, check out these projects which use https://github.com/users/<username>/contributions_calendar_data but also apply other factors based on Github's logic.
https://github.com/akerl/githubchart
https://github.com/akerl/githubstats
[Project deprecated and unavalaible for now, will be back online soon.]
Since the URL https://github.com/users/<username>/contributions_calendar_data don't work anymore, you have to parse the SVG from https://github.com/users/<username>/contributions.
Unfortunately, Github loves security and CORS is disabled on their server.
To solve this issue, I've setup an API for me and everyone who needs it, just GET https://github-stats.com/api/user/streak/current/{username} (CORS allowed), and you'll get and answer like:
{
"success":true,
"currentStreak": 3
}
https://github-stats.com will soon implement more stats endpoints :)
Please ask for new endpoint at https://github.com/SherloxFR/github-stats.com/issues, it will be a pleasure to find a way to implement them !

Tweet counter for identi.ca

Is there a way to retrieve the amount of times a certain URL was "dented" (shared on identi.ca, status.net and/or the likes?).
For twitter there are several services that give this information.
Twitter itself: http://urls.api.twitter.com/1/urls/count.json?url=http://example.com&callback=twttr.receiveCount
Tweetmeme: http://api.tweetmeme.com/url_info.jsonc?url=http://example.com
Topsy: http://otter.topsy.com/stats.js?url=http://example.com&callback=?
I don't need the fancy extra information that Tweetmeme or Topsy deliver, only the amount.
I am aware that this is problematic, seen from the "distributed" nature of status.net: it will only give a count from once single silo, e.g. identi.ca. However, for me, for now, that would be enough.
Is there such an endpoint that gives me such JSON?
I don't think so. There's a file table in StatusNet databases that holds references to dented URLs (so it wouldn't be hard to count them if you had access to database or could write a plugin -- i.e., you wouldn't have to parse all notices, just lookup the file table), but it's not exposed through the API.
The list of API possible calls for StatusNet is here: http://status.net/wiki/TwitterCompatibleAPI
In addition, there's a proposed Google Summer of Code project on this subject: Social Analytics plugin