azure-devops export work items as csv and include ALL comments/discussions - csv

if I make a query in Azure Devops for my work items, when I add columns to display, there is only comments count and discussions, but is there any where to include all comments from a work item in my output saved csv file? I need to archive the comments for every work item in a csv.
Can I do this through the azure desktop web ui somehow? or do i need to write my own script using the azure api to view comments and add them to a work item

Can I do this through the azure desktop web ui somehow? or do i need
to write my own script using the azure api to view comments and add
them to a work item
For your first question, the answer is NO. Since you want all the comments for a workitem, then built-in UI feature will not be able to achieve your requirement.
For your second the question, the answer is YES.
from azure.devops.connection import Connection
from msrest.authentication import BasicAuthentication
import requests
import csv
import os
#get all the comments of a work item
def get_work_items_comments(wi_id):
#get a connection to Azure DevOps
organization_url = 'https://dev.azure.com/xxx'
personal_access_token = 'xxx'
credentials = BasicAuthentication('', personal_access_token)
connection = Connection(base_url=organization_url, creds=credentials)
work_item_tracking_client = connection.clients.get_work_item_tracking_client()
#get the work item
work_item = work_item_tracking_client.get_work_item(wi_id)
#get the comments of the work item
comments_ref = work_item._links.additional_properties['workItemComments']['href']
#send a request to get the comments
response = requests.get(comments_ref, auth=('', personal_access_token))
#get the comments
comments = response.json()['comments']
return comments
#return work item id, work item title and related work item comments
def get_work_items_results(wi_id):
#get a connection to Azure DevOps
organization_url = 'https://dev.azure.com/xxx'
personal_access_token = 'xxx'
credentials = BasicAuthentication('', personal_access_token)
connection = Connection(base_url=organization_url, creds=credentials)
work_item_tracking_client = connection.clients.get_work_item_tracking_client()
#get the work item
work_item = work_item_tracking_client.get_work_item(wi_id)
#get the title of the work item
title = work_item.fields['System.Title']
#get the work item id
id = work_item.id
#get the comments of the work item
items = get_work_items_comments(wi_id)
array_string = []
for item in items:
text = item['text']
array_string.append(text)
print(item['text'])
return id, title, array_string
#Save the work item id, work item title and related work item comments to a csv file
#create folder workitemresults if not exist
if not os.path.exists('workitemresults'):
os.makedirs('workitemresults')
with open('workitemresults/comments_results.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Workitem ID','Workitem Title','Workitem Comments'])
#===if you want multiple work items, just for loop in this place and replace the value 120 in this place, 120 is the workitem id on my side.===
writer.writerow(get_work_items_results(120))
The above code is to capture the comments and information for one workitem. It works fine on my side(The place for what 'for loop' should be placed I had already mentioned in my code, using the for loop in that place you can capture multiple workitems):
In my situation, if I only want the text content:
#remove <div> and </div> from the text
text = text.replace('<div>','')
text = text.replace('</div>','')
Result:

Related

JSONDecodeError: Expecting value: line 1 column 1 (char 0) while getting data from Pokemon API

I am trying to scrape the pokemon API and create a dataset for all pokemon. So I have written a function which looks like this:
import requests
import json
import pandas as pd
def poke_scrape(x, y):
'''
A function that takes in a range of pokemon (based on pokedex ID) and returns
a pandas dataframe with information related to the pokemon using the Poke API
'''
#GATERING THE DATA FROM API
url = 'https://pokeapi.co/api/v2/pokemon/'
ids = range(x, (y+1))
pkmn = []
for id_ in ids:
url = 'https://pokeapi.co/api/v2/pokemon/' + str(id_)
pages = requests.get(url).json()
# content = json.dumps(pages, indent = 4, sort_keys=True)
if 'error' not in pages:
pkmn.append([pages['id'], pages['name'], pages['abilities'], pages['stats'], pages['types']])
#MAKING A DATAFRAME FROM GATHERED API DATA
cols = ['id', 'name', 'abilities', 'stats', 'types']
df = pd.DataFrame(pkmn, columns=cols)
The code works fine for most pokemon. However, when I am trying to run poke_scrape(229, 229) (so trying to load ONLY the 229th pokemon), it gives me the JSONDecodeError. It looks like this:
So far I have tried using json.loads() instead but that has not solved the issue. What is even more perplexing is that specific pokemon has loaded before and the same issue was with another ID - otherwise I could just manually enter the stats for the specific pokemon that is unable to load into my dataframe. Any help is appreciated!
Because of the way the PokeAPI works, some links to the JSON data for each pokemon only load when the links end with a '/' (such as https://pokeapi.co/api/v2/pokemon/229/ vs https://pokeapi.co/api/v2/pokemon/229 - first link will work and the second will return not found). However, others will respond with a response error because of the added '/' so fixed the issue with a few if statements right after the for loop in the beginning of the function

Django display data from json

i want to display cryptocurrency prices on my site. Therefor i parse the latest BTC/USD price from coinmarketcap.com
now i want to display them in a list but i first dont know who to save the symbol from the json to my database and second how can i display my view propperly. Currently i only save key:value of price_usd where key is the name of the currency.
views.py
def crypto_ticker(request):
list_prices = CryptoPrices.objects.get_queryset().order_by('-pk')
paginator = Paginator(list_prices, 100) # Show 100 prices per page
page = request.GET.get('page')
price = paginator.get_page(page)
return render(request, 'MyProject/crypto_ticker.html', {'price': price})
urls.py
url(r'^crypto_ticker/$', MyProject_views.crypto_ticker, name='crypto_ticker'),
models.py
class CryptoPrices(models.Model):
symbol = models.CharField(max_length=10)
key = models.CharField(max_length=30)
value = models.CharField(max_length=200)
celery update task:
#periodic_task(run_every=(crontab(minute='*/1')), name="Update Crypto rate(s)", ignore_result=True)
def get_exchange_rate():
api_url = "https://api.coinmarketcap.com/v1/ticker/?limit=100"
try:
exchange_rates = requests.get(api_url).json()
for exchange_rate in exchange_rates:
CryptoPrices.objects.update_or_create(key=exchange_rate['id'],
defaults={'value': round(float(exchange_rate['price_usd']), 3)}
)
logger.info("Exchange rate(s) updated successfully.")
except Exception as e:
print(e)
Surely just adding
symbol= exchange_rate['symbol']
to your update_or_create will work?
The JSON from coinmarketcap sets that as a key in the dictionary, unless you want an image that they use?
In that case you would have to save copies of that image yourself, create a mapping from the text of the symbol to the image itself, and format that on your html output.

Scrape table from webpage when in <div> format - using Beautiful Soup

So I'm aiming to scrape 2 tables (in different formats) from a website - https://info.fsc.org/details.php?id=a0240000005sQjGAAU&type=certificate after using the search bar to iterate this over a list of license codes. I haven't included the loop fully yet but I added it at the top for completeness.
My issue is that because the two tables I want, Product Data and Certificate Data are in 2 different formats, so I have to scrape them separately. As the Product data is in the normal "tr" format on the webpage, this bit is easy and I've managed to extract a CSV file of this. The harder bit is extracting Certificate Data, as it is in "div" form.
I've managed to print the Certificate Data as a list of text, using the class function, however I need to have it in a tabular form saved in a CSV file. As you can see, I've tried several unsuccessful ways of converting it to a CSV but If you have any suggestions, it would be much appreciated, thank you!! Also any other general tips to improve my code would be great too, as I am new to web-scraping.
#namelist = open('example.csv', newline='', delimiter = 'example')
#for name in namelist:
#include all of the below
driver = webdriver.Chrome(executable_path="/Users/jamesozden/Downloads/chromedriver")
url = "https://info.fsc.org/certificate.php"
driver.get(url)
search_bar = driver.find_element_by_xpath('//*[#id="code"]')
search_bar.send_keys("FSC-C001777")
search_bar.send_keys(Keys.RETURN)
new_url = driver.current_url
r = requests.get(new_url)
soup = BeautifulSoup(r.content,'lxml')
table = soup.find_all('table')[0]
df, = pd.read_html(str(table))
certificate = soup.find(class_= 'certificatecl').text
##certificate1 = pd.read_html(str(certificate))
driver.quit()
df.to_csv("Product_Data.csv", index=False)
##certificate1.to_csv("Certificate_Data.csv", index=False)
#print(df[0].to_json(orient='records'))
print certificate
Output:
Status
Valid
First Issue Date
2009-04-01
Last Issue Date
2018-02-16
Expiry Date
2019-04-01
Standard
FSC-STD-40-004 V3-0
What I want but over hundreds/thousands of license codes (I just manually created this one sample in Excel):
Desired output
EDIT
So whilst this is now working for Certificate Data, I also want to scrape the Product Data and output that into another .csv file. However currently it is only printing 5 copies of the product data for the final license code which is not what I want.
New Code:
df = pd.read_csv("MS_License_Codes.csv")
codes = df["License Code"]
def get_data_by_code(code):
data = [
('code', code),
('submit', 'Search'),
]
response = requests.post('https://info.fsc.org/certificate.php', data=data)
soup = BeautifulSoup(response.content, 'lxml')
status = soup.find_all("label", string="Status")[0].find_next_sibling('div').text
first_issue_date = soup.find_all("label", string="First Issue Date")[0].find_next_sibling('div').text
last_issue_date = soup.find_all("label", string="Last Issue Date")[0].find_next_sibling('div').text
expiry_date = soup.find_all("label", string="Expiry Date")[0].find_next_sibling('div').text
standard = soup.find_all("label", string="Standard")[0].find_next_sibling('div').text
return [code, status, first_issue_date, last_issue_date, expiry_date, standard]
# Just insert here output filename and codes to parse...
OUTPUT_FILE_NAME = 'Certificate_Data.csv'
#codes = ['C001777', 'C001777', 'C001777', 'C001777']
df3=pd.DataFrame()
with open(OUTPUT_FILE_NAME, 'w') as f:
writer = csv.writer(f)
for code in codes:
print('Getting code# {}'.format(code))
writer.writerow((get_data_by_code(code)))
table = soup.find_all('table')[0]
df1, = pd.read_html(str(table))
df3 = df3.append(df1)
df3.to_csv('Product_Data.csv', index = False, encoding='utf-8')
Here's all you need.
No chromedriver. No pandas. Forget about it in context of scraping.
import requests
import csv
from bs4 import BeautifulSoup
# This is all what you need for your task. Really.
# No chromedriver. Don't use it for scraping. EVER.
# No pandas. Don't use it for writing csv. It's not what pandas was made for.
#Function to parse single data page based on single input code.
def get_data_by_code(code):
# Parameters to build POST-request.
# "type" and "submit" params are static. "code" is your desired code to scrape.
data = [
('type', 'certificate'),
('code', code),
('submit', 'Search'),
]
# POST-request to gain page data.
response = requests.post('https://info.fsc.org/certificate.php', data=data)
# "soup" object to parse html data.
soup = BeautifulSoup(response.content, 'lxml')
# "status" variable. Contains first's found [LABEL tag, with text="Status"] following sibling DIV text. Which is status.
status = soup.find_all("label", string="Status")[0].find_next_sibling('div').text
# Same for issue dates... etc.
first_issue_date = soup.find_all("label", string="First Issue Date")[0].find_next_sibling('div').text
last_issue_date = soup.find_all("label", string="Last Issue Date")[0].find_next_sibling('div').text
expiry_date = soup.find_all("label", string="Expiry Date")[0].find_next_sibling('div').text
standard = soup.find_all("label", string="Standard")[0].find_next_sibling('div').text
# Returning found data as list of values.
return [response.url, status, first_issue_date, last_issue_date, expiry_date, standard]
# Just insert here output filename and codes to parse...
OUTPUT_FILE_NAME = 'output.csv'
codes = ['C001777', 'C001777', 'C001777', 'C001777']
with open(OUTPUT_FILE_NAME, 'w') as f:
writer = csv.writer(f)
for code in codes:
print('Getting code# {}'.format(code))
#Writing list of values to file as single row.
writer.writerow((get_data_by_code(code)))
Everything is really straightforward here. I'd suggest you spend some time in Chrome dev tools "network" tab to have a better understanding of request forging, which is a must for scraping tasks.
In general, you don't need to run chrome to click the "search" button, you need to forge request generated by this click. Same for any form and ajax.
well... you should sharpen your skills (:
df3=pd.DataFrame()
with open(OUTPUT_FILE_NAME, 'w') as f:
writer = csv.writer(f)
for code in codes:
print('Getting code# {}'.format(code))
writer.writerow((get_data_by_code(code)))
### HERE'S THE PROBLEM:
# "soup" variable is declared inside of "get_data_by_code" function.
# So you can't use it in outer context.
table = soup.find_all('table')[0] # <--- you should move this line to
#definition of "get_data_by_code" function and return it's value somehow...
df1, = pd.read_html(str(table))
df3 = df3.append(df1)
df3.to_csv('Product_Data.csv', index = False, encoding='utf-8')
As per example you can return dictionary of values from "get_data_by_code" function:
def get_data_by_code(code):
...
table = soup.find_all('table')[0]
return dict(row=row, table=table)

similar pubmed articles via pubmed api

Is it possible to obtain similar pubmed articles given a pmid. Example this link shows similar articles on the rights hand side.
You can do it with BioPython using the NCBI API. The command you are looking for is neighbor_score. Alternatively you can get the data directly via the URL.
from Bio import Entrez
Entrez.email = "Your.Name.Here#example.org"
handle = Entrez.elink(db="pubmed", id="26998445", cmd="neighbor_score", rettype="xml")
records = Entrez.read(handle)
scores = sorted(records[0]['LinkSetDb'][0]['Link'], key=lambda k: int(k['Score']))
#show the top 5 results
for i in range(1, 6):
handle = Entrez.efetch(db="pubmed", id=scores[-i]['Id'], rettype="xml")
record = Entrez.read(handle)
print(record)

string search returns none or []

I have several urls that i want to open to a specific place and search for a specific name but I'm only getting None returned or [].
I have searched but cannot see an answer that is pertinent to my code.
from bs4 import BeautifulSoup
from urllib import request
webpage = request.urlopen("http://www.dsfire.gov.uk/News/Newsdesk/IncidentsPast7days.cfm?siteCategoryId=3&T1ID=26&T2ID=35")
soup = BeautifulSoup(webpage)
incidents = soup.find(id="CollapsiblePanel1")
Links = []
for line in incidents.find_all('a'):
Links.append("http://www.dsfire.gov.uk/News/Newsdesk/"+line.get('href'))
n = 0
e = len(Links)
while n < e:
webpage = request.urlopen(Links[n])
soup = BeautifulSoup(webpage)
station = soup.find(id="IncidentDetailContainer")
#search string
print(soup.body.findAll(text='Ashburton'))
n=n+1
I know its in the last link found on the page.
Thanks in advance for any ideas or comments
If your ouput is "[]" only, it means your output is an array. You have to set the index then => variable[index].
Try this one
print(soup.body.findAll(text='Ashburton')[0])
...where storing it into a variable first would be easier:
search = soup.body.findAll(text='Ashburton')
print(search[0])
This will bring you the first found item.
For printing all found items you could go
search = soup.body.findAll(text='Ashburton')
foreach(entry in search)
print(entry)
Notice this is more pseude-code instead of a working example. I really dont know beautifulsoap.