Pubmed API returns less results than web interface - pubmed

I'm trying to access Pubmed results via R using their API, but I consistently get fewer results than what the same query achieves when used with the web interface. By digging in the output I noticed that the problem lays in a different query translation between the two access methods.
I am using the rentrez package, but the results I get are the same also with other related rpackages, so I guess it's related to the API itself.
here's the code to reproduce the results:
install.packages('rentrez')
rentrez::entrez_search(db="pubmed", term = '((model OR models OR modeling OR network OR networks) AND (dissemination OR transmission OR spread OR diffusion) AND (nosocomial OR hospital OR "long-term-care" OR "long term care" OR "longterm care" OR "long-term care" OR "hospital acquired" OR "healtcare associated") AND (infection OR resistance OR resistant)) AND (2010[PDAT]:2020[PDAT])')$count
[1] 7157
The same query on https://pubmed.ncbi.nlm.nih.gov/ returns 9263 results.

Not sure if you still need this now. Just in case someone else has the same problem.
I had the same issue as you did and I found something might be useful from a GitHub issue.
It seems that the API service needs to be updated to match the new web service, but it's been a year now and still no promising announcement has been made by the official.
An alternative is provided by the easyPubMed author. Hope this is what you were looking for.
easyPubMed Issue

Related

Is there a way to find the workitems for a release in Azure DevOps Api 5.1?

I can find a release in Azure DevOps Api 5.1 by a request to https://vsrm.dev.azure.com/mycompany/myproject/_apis/release/releases/myreleaseid?api-version=5.1
How can I get the workitems of that release as shown on the devops portal under Deployment - Stages - Workitems?
My naive approach just using https://vsrm.dev.azure.com/mycompany/myproject/_apis/release/releases/myreleaseid/workitems?api-version=5.1
resulted in a 404.
There is a stakeholder in the workitem and I want to send him a notification adter the release.
How can I get the workitems of that release as shown on the devops
portal under Deployment - Stages - Workitems?
Hard to say, but I don't find any document about this topic... So I determine to use F12 to find that. And here's the one I finally find:
Get:https://vsrm.dev.azure.com/mycompany/myproject/_apis/Release/releases/myreleaseId/workitems?baseReleaseId={my baseReleaseId}&%24top=250&artifactAlias={my artifactAlias}
It will returns the IDs of the workItems for the release. Its response format:
After you get the IDs, it's easy to get details if you need using Get Work Items Batch or what.
In addition:
1.myreleaseId is the ReleaseID. (On my side, the ID is 7 if it's Release-7)
2.my artifactAlias is this:
3.For my baseReleaseId, I'm not 100% sure about its meaning. I think it could be something like ReleaseToCompareAgainst. Hint from Daniel. (On my side, if my releaseId=7, then I use basereleaseID=6(7-1), it works to get the correct WIT ids). (Actually I suggest you can use F12 in that web page to check your corresponding URL.)
And according to Mathias F: The baseReleaseId realy is the last previous release that has a deployment (-1 in some cases)
4.About how to use F12 to find rest api which may not be documented:
Hope all above helps :)

How to restrict fields returned by stackexchange api, and turn off paging?

I'd like to have a list of just the current titles for all questions in one of the smaller (less than 10,000 questions) stackexchange site. I tried the interactive utility here: https://api.stackexchange.com/docs/questions and it both reports the result as a json at the bottom, and produces the requesting url at the top. For example:
https://api.stackexchange.com/2.2/questions?order=desc&sort=activity&tagged=apples&site=cooking
returns this JSON in my browser:
{"items":[{"tags":["apples","crumble"],"owner":{ ...
...
...],"has_more":true,"quota_max":300,"quota_remaining":252}
What is quota? It was 10,000 on one search on one site, but suddenly it's only 300 here.
I won't be doing this very often, what I'd like is the quickest way to edit that (or similar of course) url so I can get a list of all of the titles on a small site. I don't understand how to use paging, and I don't need any of the other fields. I don't care if I get them, but I'm thinking if I exclude them I can have more at once.
If I need to script it, python (2.7) is my preferred (only) language.
quota_max is the number of requests your application is allowed per day. 300 is the default for an unregistered application. This used to be mentioned directly on the page describing throttles, but seems to have been removed. Here is historical information describing the default.
To increase this to 10,000, you need to register an application and then authenticate by passing an access token in your script.
To get all titles on a site, you can use a Python library to help:
StackAPI. The answer below will use this library. DISCLAIMER: I wrote this library
Py-StackExchange
SEAPI
StackPy
Assuming you have registered your application and authenticated we can proceed.
First, install StackAPI (documentation):
pip install stackapi
This code will then grab the 10,000 most recent questions (max_pages * page_size) for the site hardwarerecs. Each page costs you one API hit, so the more items per page, the few API calls.
from stackapi import StackAPI
SITE = StackAPI('hardwarerecs')
SITE.page_size = 100
SITE.max_pages = 100
# Filter to only get question title and link
filter = '!BHMIbze0EQ*ved8LyoO6rNjkuLgHPR'
questions = SITE.fetch('questions', filter=filter)
In the questions variable is a dictionary that looks very similar to the API output, except that the library did all the paging for you. Your data is in questions['data'] and, in this case, contains a list of dictionaries that look like this:
[
...
{u'link': u'http://hardwarerecs.stackexchange.com/questions/29/sound-board-to-replace-a-gl2200-in-a-house-of-worship-foh-setting',
u'title': u'Sound board to replace a GL2200 in a house-of-worship FOH setting?'},
{ u'link': u'http://hardwarerecs.stackexchange.com/questions/31/passive-gps-tracker-logger',
u'title': u'Passive GPS tracker/logger'}
...
]
This result set is limited to only the title and the link because of the filter we applied. You can find the appropriate filter by adjusting what fields you want in the web UI and copying the filter field.
The hardwarerecs parameter that is passed when creating the SITE parameter is the first part of the site's domain URL. Alternatively, you can find it by looking at the api_site_parameter for your site when looking at the /sites end point.

The prefix "atom" for element "atom:cc" is not bound exception

I am trying to fetch the contacts of the user who have an account in google apps marketplace. While fetching the contact i get the following error
com.google.gdata.util.ParseException: The prefix "atom" for element "atom:cc" is not bound.
at com.google.gdata.util.XmlParser.parse(XmlParser.java:695)|
at com.google.gdata.util.XmlParser.parse(XmlParser.java:568)|
at com.google.gdata.data.BaseFeed.parseAtom(BaseFeed.java:793)|
at com.google.gdata.wireformats.input.AtomDataParser.parse(AtomDataParser.java:68)|
at com.google.gdata.wireformats.input.AtomDataParser.parse(AtomDataParser.java:39)|
at com.google.gdata.wireformats.input.CharacterParser.parse(CharacterParser.java:)|
at com.google.gdata.wireformats.input.XmlInputParser.parse(XmlInputParser.java:52)|...
I am using Java client library to fetch the contacts. Can you please let me know is there an issue in the java client library? This issue is there for a long time and I badly need to find a solution for this...What should I do to make it work...Any help will be grateful..
Thanks,
VijayRaj
I got the same Problem, that you have with the Java Client, with the .NET client.
After contacting Google support, they told me that the Contacts arbitrary XML data which is in an Property element cannot be parsed within my version of GData .
However, there is a time intensive workaround, by deleting and recreating Contacts, but thats probably not what you are looking for, me either.
After switching to the Python implementation all works fine now.
Check out this Issue report Issue 361

How can I add "current streak" of contributions from github to my blog?

I have a personal blog I built using rails. I want to add a section to my site that displays my current streak of github contributions. What would be the best way about doing this?
edit: for clarification, here is what I want:
just the number of days is all that is necessary for me.
Considering the GitHub API for Users doesn't yet expose that particular information (number of days for current stream of contributions), you might have to:
scrape it (extract it by reading the user's GitHub page)
As klamping mentions in his answer (upvoted), the url to scrap would be:
https://github.com/users/<username>/contributions_calendar_data
https://github.com/users/<username>/contributions
(for public repos only, though)
SherlockStd has an updated (May 2017) parsing code below:
https://github-stats.com/api/user/streak/current/:username
try projects which are using https://github.com/users/<username>/contributions_calendar_data (as listed in Marques Johansson's answer, upvoted)
IonicaBizau/git-stats:
akerl/githubchart (Github contribution SVG generator)
akerl/githubstats (Github contribution statistics)
build that graph yourself: see the GitHub project git-cal
git-cal is a simple script to view commits calendar (similar to GitHub contributions calendar) on command line.
Each block in the graph corresponds to a day and is shaded with one of the 5 possible colors, each representing relative number of commits on that day.
or establish a service that will report, each day, any new commit for that given day to a Google Calendar (using the Google Calendar API through a project like nf/streak).
You can then read that information and report it in your blog.
You can find various example of scraping that information:
github_team_calendar.py
weekend-commits.js
As in:
$.getJSON('https://github.com/users/' + location.pathname.replace(/\//g, '') + '/contributions_calendar_data', weekendWork);
leaderboard.rb:
Like:
leaderboard = members.map do |u|
user_stats = get("https://github.com/users/#{u}/contributions_calendar_data")
total = user_stats.map { |s| s[1] }.reduce(&:+)
[u, total]
end
... (you get the idea)
The URL for the plain JSON data was:
https://github.com/users/[username]/contributions_calendar_data
[Edit: Looks like this URL no longer works)
There is a URL which generates the SVG, which other answers have indicated. That is here:
https://github.com/users/[username]/contributions
Simply replace [username] with your github username in the URL and you should be able to see the chart. See other answers for more in-depth explanations
If you want something that matches the visual appearance of GitHub's chart, check out these projects which use https://github.com/users/<username>/contributions_calendar_data but also apply other factors based on Github's logic.
https://github.com/akerl/githubchart
https://github.com/akerl/githubstats
[Project deprecated and unavalaible for now, will be back online soon.]
Since the URL https://github.com/users/<username>/contributions_calendar_data don't work anymore, you have to parse the SVG from https://github.com/users/<username>/contributions.
Unfortunately, Github loves security and CORS is disabled on their server.
To solve this issue, I've setup an API for me and everyone who needs it, just GET https://github-stats.com/api/user/streak/current/{username} (CORS allowed), and you'll get and answer like:
{
"success":true,
"currentStreak": 3
}
https://github-stats.com will soon implement more stats endpoints :)
Please ask for new endpoint at https://github.com/SherloxFR/github-stats.com/issues, it will be a pleasure to find a way to implement them !

REST interface usage for multiple resources

I am currently adding a REST API over http to an online service and I am confronted with a very simple problem for which I cannot find an answer that satisfies me:
I have mainly 2 resources: 'user' and 'reports', as you would have guessed reports are associated to users (to one and only one, = foreign key in my db)
Anyway I have this url mapping for GET :
mywebsite/api/users/{id} : returns a user and related information, or a list of users if id is not present
mywebsite/api/report/{id} : returns a report and related information, or a list of reports if id is not present
Now I would like to get the reports for a specific user, my way of doing it now is to add an optional parameter to the GET method for reports: ?username={username} and if it is present, I am filtering the results to return only the reports for this user.
I can't help but think something is wrong... if I start doing things like this I will have my methods handling GET full of if/else looking for missing parameters...
Other solutions I I thought of are:
incorporate the reports in the resulting GET on mywebsite/api/users/{id} but I have many many reports so in the end it will become really bad...
map another url just for this function, but it just doesn't feel right...
I am just getting the grips of this REST thing, I like the concept but a little explanation on this matter would really help me understand it better.
Thanks
Edit:
It seems I have hit a common problem in the REST world, I have tied my resources to a model. If you tie a resource to a model you end up having trouble with aggregate attributes.
Some guy describes this error here http://jacobian.org/writing/rest-worst-practices/ but I have yet to understand how to manage that as he said...
fyi I am using django/piston but this question should be answerable regardless of any language.
I can't help but think something is wrong...
The only thing you're doing wrong is thinking that your URI structure makes your application more or less RESTful. The original REST literature never says that query strings are bad. People tend to get hung up on URI structure and seem to think that your URIs must be structured a certain way to be considered RESTful. There is nothing wrong with using ?username=<username>. A URI is just an ID (though some can be more human friendly than others).
Bottom line: don't get hung up on how your URIs look. There are much more important things to focus on (promoting hyperlinking/hypermedia, sticking to a uniform interface - typically HTTP, cacheability, etc.).
This may be a big of a digression but, as for your comment about the coupling of resources to models, you're still okay. If you do go the /reports/ID/user route, just think of 'user' as a relationship name on your reports model. Surely your model defines the relationship between a report and a user. You can just parse the last part of your URI so that it matches the name of this relationship. In the case of one to one relationship like you describe its always a good idea to also set the Content-Location header to match the canonical URI of the user.
For example. Say report 123 belongs to user 1. You now have two ways of referring this user:
http://example.com/reports/123/user
http://example.com/user/1
For the first URI, it would also be a good idea to set Content-Location: http://example.com/user/1 header
Here's how I would implement this:
mywebsite/api/users : returns a list of users
mywebsite/api/users/{id} : returns a user and related information if user exists, otherwise 404
mywebsite/api/users/{id}/reports : returns reports for a specific user if exists, otherwise 404
mywebsite/api/users/{id}/reports/{id} : returns specific report for a specific user if exists, otherwise 404
mywebsite/api/reports : returns a list of reports
mywebsite/api/reports/{id} : returns a report and related information if exists, otherwise 404
HTH,
-aj