similar pubmed articles via pubmed api - pubmed

Is it possible to obtain similar pubmed articles given a pmid. Example this link shows similar articles on the rights hand side.

You can do it with BioPython using the NCBI API. The command you are looking for is neighbor_score. Alternatively you can get the data directly via the URL.
from Bio import Entrez
Entrez.email = "Your.Name.Here#example.org"
handle = Entrez.elink(db="pubmed", id="26998445", cmd="neighbor_score", rettype="xml")
records = Entrez.read(handle)
scores = sorted(records[0]['LinkSetDb'][0]['Link'], key=lambda k: int(k['Score']))
#show the top 5 results
for i in range(1, 6):
handle = Entrez.efetch(db="pubmed", id=scores[-i]['Id'], rettype="xml")
record = Entrez.read(handle)
print(record)

Related

azure-devops export work items as csv and include ALL comments/discussions

if I make a query in Azure Devops for my work items, when I add columns to display, there is only comments count and discussions, but is there any where to include all comments from a work item in my output saved csv file? I need to archive the comments for every work item in a csv.
Can I do this through the azure desktop web ui somehow? or do i need to write my own script using the azure api to view comments and add them to a work item
Can I do this through the azure desktop web ui somehow? or do i need
to write my own script using the azure api to view comments and add
them to a work item
For your first question, the answer is NO. Since you want all the comments for a workitem, then built-in UI feature will not be able to achieve your requirement.
For your second the question, the answer is YES.
from azure.devops.connection import Connection
from msrest.authentication import BasicAuthentication
import requests
import csv
import os
#get all the comments of a work item
def get_work_items_comments(wi_id):
#get a connection to Azure DevOps
organization_url = 'https://dev.azure.com/xxx'
personal_access_token = 'xxx'
credentials = BasicAuthentication('', personal_access_token)
connection = Connection(base_url=organization_url, creds=credentials)
work_item_tracking_client = connection.clients.get_work_item_tracking_client()
#get the work item
work_item = work_item_tracking_client.get_work_item(wi_id)
#get the comments of the work item
comments_ref = work_item._links.additional_properties['workItemComments']['href']
#send a request to get the comments
response = requests.get(comments_ref, auth=('', personal_access_token))
#get the comments
comments = response.json()['comments']
return comments
#return work item id, work item title and related work item comments
def get_work_items_results(wi_id):
#get a connection to Azure DevOps
organization_url = 'https://dev.azure.com/xxx'
personal_access_token = 'xxx'
credentials = BasicAuthentication('', personal_access_token)
connection = Connection(base_url=organization_url, creds=credentials)
work_item_tracking_client = connection.clients.get_work_item_tracking_client()
#get the work item
work_item = work_item_tracking_client.get_work_item(wi_id)
#get the title of the work item
title = work_item.fields['System.Title']
#get the work item id
id = work_item.id
#get the comments of the work item
items = get_work_items_comments(wi_id)
array_string = []
for item in items:
text = item['text']
array_string.append(text)
print(item['text'])
return id, title, array_string
#Save the work item id, work item title and related work item comments to a csv file
#create folder workitemresults if not exist
if not os.path.exists('workitemresults'):
os.makedirs('workitemresults')
with open('workitemresults/comments_results.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Workitem ID','Workitem Title','Workitem Comments'])
#===if you want multiple work items, just for loop in this place and replace the value 120 in this place, 120 is the workitem id on my side.===
writer.writerow(get_work_items_results(120))
The above code is to capture the comments and information for one workitem. It works fine on my side(The place for what 'for loop' should be placed I had already mentioned in my code, using the for loop in that place you can capture multiple workitems):
In my situation, if I only want the text content:
#remove <div> and </div> from the text
text = text.replace('<div>','')
text = text.replace('</div>','')
Result:

JSONDecodeError: Expecting value: line 1 column 1 (char 0) while getting data from Pokemon API

I am trying to scrape the pokemon API and create a dataset for all pokemon. So I have written a function which looks like this:
import requests
import json
import pandas as pd
def poke_scrape(x, y):
'''
A function that takes in a range of pokemon (based on pokedex ID) and returns
a pandas dataframe with information related to the pokemon using the Poke API
'''
#GATERING THE DATA FROM API
url = 'https://pokeapi.co/api/v2/pokemon/'
ids = range(x, (y+1))
pkmn = []
for id_ in ids:
url = 'https://pokeapi.co/api/v2/pokemon/' + str(id_)
pages = requests.get(url).json()
# content = json.dumps(pages, indent = 4, sort_keys=True)
if 'error' not in pages:
pkmn.append([pages['id'], pages['name'], pages['abilities'], pages['stats'], pages['types']])
#MAKING A DATAFRAME FROM GATHERED API DATA
cols = ['id', 'name', 'abilities', 'stats', 'types']
df = pd.DataFrame(pkmn, columns=cols)
The code works fine for most pokemon. However, when I am trying to run poke_scrape(229, 229) (so trying to load ONLY the 229th pokemon), it gives me the JSONDecodeError. It looks like this:
So far I have tried using json.loads() instead but that has not solved the issue. What is even more perplexing is that specific pokemon has loaded before and the same issue was with another ID - otherwise I could just manually enter the stats for the specific pokemon that is unable to load into my dataframe. Any help is appreciated!
Because of the way the PokeAPI works, some links to the JSON data for each pokemon only load when the links end with a '/' (such as https://pokeapi.co/api/v2/pokemon/229/ vs https://pokeapi.co/api/v2/pokemon/229 - first link will work and the second will return not found). However, others will respond with a response error because of the added '/' so fixed the issue with a few if statements right after the for loop in the beginning of the function

How to get data from Australia Bureau of Statistics using pandasmdx

Is there anyone who got ABS data using pandasmsdx library?
Here is the code to get data from European Central Bank (ECB) which is working.
from pandasdmx import Request
ecb = Request('ECB')
flow_response = ecb.dataflow()
print(flow_response.write().dataflow.head())
exr_flow = ecb.dataflow('EXR')
dsd = exr_flow.dataflow.EXR.structure()
data_response = ecb.data(resource_id='EXR', key={'CURRENCY': ['USD', 'JPY']}, params={'startPeriod': '2016'})
However, when I change Request('ECB') to Request('ABS') , error popups in 2nd line saying,
"{ValueError}This agency only supports requests for data, not dataflow."
Is there a way to get data from ABS?
documentation for pandasdmx: https://pandasdmx.readthedocs.io/en/stable/usage.html#basic-usage
Hope this will help
from pandasdmx import Request
Agency_Code = 'ABS'
Dataset_Id = 'ATSI_BIRTHS_SUMM'
ABS = Request(Agency_Code)
data_response = ABS.data(resource_id='ATSI_BIRTHS_SUMM', params={'startPeriod': '2016'})
#This will result into a stacked DataFrame
df = data_response.write(data_response.data.series, parse_time=False)
#A flat DataFrame
data_response.write().unstack().reset_index()
Australian Bureau of Statistics (ABS) only supports to their SDMX-JSON APIs, they don't send SDMX-ML messages like others. That's the reason it doesn't supports dataflow feature.
Please read for further reference: https://pandasdmx.readthedocs.io/en/stable/agencies.html#pre-configured-data-providers

Type Error: Result Set Is Not Callable - BeautifulSoup

I am having a problem with web-scraping. I am trying to learn how to do it, but I can't seem to get past some of the basics. I am getting an error, "TypeError: 'ResultSet' object is not callable" is the error I'm getting.
I've tried a number of different things. I was originally trying to use the "find" instead of "find_all" function, but I was having an issue with beautifulsoup pulling in a nonetype. I was unable to create an if loop that could overcome that exception, so I tried using the "find_all" instead.
page = requests.get('https://topworkplaces.com/publication/ocregister/')
soup = BeautifulSoup(page.text,'html.parser')all_company_list =
soup.find_all(class_='sortable-table')
#all_company_list = soup.find(class_='sortable-table')
company_name_list_items = all_company_list('td')
for company_name in company_name_list_items:
#print(company_name.prettify())
companies = company_name.content[0]
I'd like this to pull in all the companies in Orange County California that are on this list in a clean manner. As you can see, I've already accomplished pulling them in, but I want the list to be clean.
You've got the right idea. I think instead of immediately finding all the <td> tags (which is going to return one <td> for each row (140 rows) and each column in the row (4 columns)), if you want only the company names, it might be easier to find all the rows (<tr> tags) then append however many columns you want by iterating the <td>s in each row.
This will get the first column, the company names:
import requests
from bs4 import BeautifulSoup
page = requests.get('https://topworkplaces.com/publication/ocregister/')
soup = BeautifulSoup(page.text,'html.parser')
all_company_list = soup.find_all('tr')
company_list = [c.find('td').text for c in all_company_list[1::]]
Now company_list contains all 140 company names:
>>> print(len(company_list))
['Advanced Behavioral Health', 'Advanced Management Company & R³ Construction Services, Inc.',
...
, 'Wes-Tec, Inc', 'Western Resources Title Company', 'Wunderman', 'Ytel, Inc.', 'Zillow Group']
Change c.find('td') to c.find_all('td') and iterate that list to get all the columns for each company.
Pandas:
Pandas is often useful here. The page uses multiple sorts including company size, rank. I show rank sort.
import pandas as pd
table = pd.read_html('https://topworkplaces.com/publication/ocregister/')[0]
table.columns = table.iloc[0]
table = table[1:]
table.Rank = pd.to_numeric(table.Rank)
rank_sort_table = table.sort_values(by='Rank', axis=0, ascending = True)
rank_sort_table.reset_index(inplace=True, drop=True)
rank_sort_table.columns.names = ['Index']
print(rank_sort_table)
Depending on your sort, companies in order:
print(rank_sort_table.Company)
Requests:
Incidentally, you can use nth-of-type to select just first column (company names) and use id, rather than class name, to identify the table as faster
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://topworkplaces.com/publication/ocregister/')
soup = bs(r.content, 'lxml')
names = [item.text for item in soup.select('#twpRegionalList td:nth-of-type(1)')]
print(names)
Note the default sorting is alphabetical on name column rather than rank.
Reference:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html

How to order a json file list?

My goal is to get specific data on many profiles on khanacademy by using their API.
My problem is: in their API, json files have different list orders. It can vary from one to another.
Here is my code:
from urllib.request import urlopen
import json
# here is a list with two json file links:
profiles=['https://www.khanacademy.org/api/internal/user/kaid_329989584305166460858587/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959','https://www.khanacademy.org/api/internal/user/kaid_901866966302088310331512/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959']
# for each json file, take some specific data out
for profile in profiles:
print(profile)
with urlopen(profile) as response:
source = response.read()
data = json.loads(source)
votes = data[1]['renderData']['discussionData']['statistics']['votes']
print(votes)
I expected something like this:
https://www.khanacademy.org/api/internal/user/kaid_329989584305166460858587/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959
100
https://www.khanacademy.org/api/internal/user/kaid_901866966302088310331512/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959
41
Instead I got an error:
https://www.khanacademy.org/api/internal/user/kaid_329989584305166460858587/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959
100
https://www.khanacademy.org/api/internal/user/kaid_901866966302088310331512/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959
Traceback (most recent call last):
File "bitch.py", line 12, in <module>
votes = data[1]['renderData']['discussionData']['statistics']['votes']
KeyError: 'discussionData'
As we can see:
This link A is working fine: https://www.khanacademy.org/api/internal/user/kaid_329989584305166460858587/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959
But this link B is not working: https://www.khanacademy.org/api/internal/user/kaid_901866966302088310331512/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959 And that's because in this json file. The list is not in the same order as it is in the A link.
My question is: Why? And how can I write my script to get into account these variation of orders?
There is probably something to do with .sort(). But I am missing something.
Maybe I should also precise that I am using python 3.7.2.
Link A: desired data (yellow) is in the second item of the list (blue):
Link B: desired data (yellow) is in the third item of the list (blue):
You could use an if to test if votes in current index dictionary
import requests
urls = ['https://www.khanacademy.org/api/internal/user/kaid_329989584305166460858587/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959',
'https://www.khanacademy.org/api/internal/user/kaid_901866966302088310331512/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959']
for url in urls:
r = requests.get(url).json()
result = [item['renderData']['discussionData']['statistics']['votes'] for item in r if 'votes' in str(item)]
print(result)
Catching exceptions in python doesn't take much overhead unlike other languages so I would recommend the "better ask forgiveness then permission" solution. This will be slightly faster than searching through a str for the word votes as it will fail instantly if the key is invalid.
import requests
urls = ['https://www.khanacademy.org/api/internal/user/kaid_329989584305166460858587/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959',
'https://www.khanacademy.org/api/internal/user/kaid_901866966302088310331512/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959']
for url in urls:
response = requests.get(url).json()
result = []
for item in response:
try:
result.append(item['renderData']['discussionData']['statistics']['votes'])
except KeyError:
pass # Could not find votes
print(result)