Count the number of people having a property bounded by two numbers - json

The following code goes over the 10 pages of JSON returned by GET request to the URL.
and checks how many records satisfy the condition that bloodPressureDiastole is between the specified limits. It does the job, but I was wondering if there was a better or cleaner way to achieve this in python
import urllib.request
import urllib.parse
import json
baseUrl = 'https://jsonmock.hackerrank.com/api/medical_records?page='
count = 0
for i in range(1, 11):
url = baseUrl+str(i)
f = urllib.request.urlopen(url)
response = f.read().decode('utf-8')
response = json.loads(response)
lowerlimit = 110
upperlimit = 120
for elem in response['data']:
bd = elem['vitals']['bloodPressureDiastole']
if bd >= lowerlimit and bd <= upperlimit:
count = count+1
print(count)

There is no access through fields to json content because you get dict object from json.loads (see translation scheme here). It realises access via __getitem__ method (dict[key]) instead of __getattr__ (object.field) as keys may be any hashible objects not only strings. Moreover, even strings cannot serve as fields if they starts with digits or are the same with built-in dictionary methods.
Despite this, you can define your own custom class realising desired behavior with acceptable key names. json.loads has an argument object_hook wherein you may put any callable object (function or class) that take a dict as the sole argument (not only the resulted one but every one in json recursively) & return something as the result. If your jsons match some template, you can define a class with predefined fields for the json content & even with methods in order to get a robust Python-object, a part of your domain logic.
For instance, let's realise the access through fields. I get json content from response.json method from requests but it has the same arguments as in json package. Comments in code contain remarks about how to make your code more pythonic.
from collections import Counter
from requests import get
class CustomJSON(dict):
def __getattr__(self, key):
return self[key]
def __setattr__(self, key, value):
self[key] = value
LOWER_LIMIT = 110 # Constants should be in uppercase.
UPPER_LIMIT = 120
base_url = 'https://jsonmock.hackerrank.com/api/medical_records'
# It is better use special tools for handling URLs
# in order to evade possible exceptions in the future.
# By the way, your option could look clearer with f-strings
# that can put values from variables (not only) in-place:
# url = f'https://jsonmock.hackerrank.com/api/medical_records?page={i}'
counter = Counter(normal_pressure=0)
# It might be left as it was. This option is useful
# in case of additional counting any other information.
for page_number in range(1, 11):
records = get(
base_url, data={"page": page_number}
).json(object_hook=CustomJSON)
# Python has a pile of libraries for handling url requests & responses.
# urllib is a standard library rewritten from scratch for Python 3.
# However, there is a more featured (connection pooling, redirections, proxi,
# SSL verification &c.) & convenient third-party
# (this is the only disadvantage) library: urllib3.
# Based on it, requests provides an easier, more convenient & friendlier way
# to work with url-requests. So I highly recommend using it
# unless you are aiming for complex connections & url-processing.
for elem in records.data:
if LOWER_LIMIT <= elem.vitals.bloodPressureDiastole <= UPPER_LIMIT:
counter["normal_pressure"] += 1
print(counter)

Related

StateAccessViolation: Value must be a literal - Vyper Ethereum smart contract

Version Information
vyper Version (output of vyper --version): 0.2.8+commit.069936f
OS: osx
Python Version (output of python --version): Python 2.7.16
Environment (output of pip freeze):
altgraph==0.10.2
bdist-mpkg==0.5.0
bonjour-py==0.3
macholib==1.5.1
matplotlib==1.3.1
modulegraph==0.10.4
numpy==1.8.0rc1
py2app==0.7.3
pyobjc-core==2.5.1
pyobjc-framework-Accounts==2.5.1
pyobjc-framework-AddressBook==2.5.1
pyobjc-framework-AppleScriptKit==2.5.1
pyobjc-framework-AppleScriptObjC==2.5.1
pyobjc-framework-Automator==2.5.1
pyobjc-framework-CFNetwork==2.5.1
pyobjc-framework-Cocoa==2.5.1
pyobjc-framework-Collaboration==2.5.1
pyobjc-framework-CoreData==2.5.1
pyobjc-framework-CoreLocation==2.5.1
pyobjc-framework-CoreText==2.5.1
pyobjc-framework-DictionaryServices==2.5.1
pyobjc-framework-EventKit==2.5.1
pyobjc-framework-ExceptionHandling==2.5.1
pyobjc-framework-FSEvents==2.5.1
pyobjc-framework-InputMethodKit==2.5.1
pyobjc-framework-InstallerPlugins==2.5.1
pyobjc-framework-InstantMessage==2.5.1
pyobjc-framework-LatentSemanticMapping==2.5.1
pyobjc-framework-LaunchServices==2.5.1
pyobjc-framework-Message==2.5.1
pyobjc-framework-OpenDirectory==2.5.1
pyobjc-framework-PreferencePanes==2.5.1
pyobjc-framework-PubSub==2.5.1
pyobjc-framework-QTKit==2.5.1
pyobjc-framework-Quartz==2.5.1
pyobjc-framework-ScreenSaver==2.5.1
pyobjc-framework-ScriptingBridge==2.5.1
pyobjc-framework-SearchKit==2.5.1
pyobjc-framework-ServiceManagement==2.5.1
pyobjc-framework-Social==2.5.1
pyobjc-framework-SyncServices==2.5.1
pyobjc-framework-SystemConfiguration==2.5.1
pyobjc-framework-WebKit==2.5.1
pyOpenSSL==0.13.1
pyparsing==2.0.1
python-dateutil==1.5
pytz==2013.7
scipy==0.13.0b1
six==1.4.1
xattr==0.6.4
this call of a for loop:
for i in range(self.some_uint256):
# do something...
is throwing the error:
StateAccessViolation: Value must be a literal
full error output:
vyper.exceptions.StateAccessViolation: Value must be a literalvyper.exceptions.StateAccessViolation: Value must be a literal
contract "vyper-farm/contracts/Farm.vy", function "_employ", line 152:4
151
---> 152 for i in range(self.num_employees):
-------------^
153 pass
what exactly am i doing wrong?
is this a misunderstanding as to what a literal is on my part?
Look at the description of range-function, there just one way to pass a variable to it:
for i in range(a, a + N):
pass
In your case it should be like this (not sure that it be useful):
num_employees: public(uint256)
#external
def __init__():
self.num_employees = 16
#external
def do_smth():
for i in range(self.num_employees, self.num_employees + 10):
pass
the issue above is not one of misunderstanding the for loop's use, instead it is a result of incompatible coding style with the security measures of vyper
the reason the for loop was being created was to make sure when an 'employee' was 'fired' or 'quit' then there wouldn't be an empty record in the list of 'employee's which was being maintained
instead, in order to avoid using a for loop altogether, and with a small sacrifice of not being able to remove no longer 'active employee's, best practice is to just track the 'employee' in a hashmap:
idToEmployee: HashMap[uint256, employee]
and when tracking the employees, simply assign an id attribute to the 'employee' using a global variable called num_employees
ex:
def employ():
new_employee: employee = employee ({
id: self.num_employees
})
self.idToEmployee[self.num_employees] = new_employee
when attempting to view or update that employee's info simply use:
self.idToEmployee[id]
#vladimir has a good explanation of how range is passed variables if there is still confusion about for loops in vyper for the reader
In fact, you can't use variables in range() directly, but we can use other method.
Here is my advice:
for i in range(999999):
if i < self.some_uint256:
# do something...
else:
break

Consuming JSON API data with Django

From a Django app, I am able to consume data from a separate Restful API, but what about filtering? Below returns all books and its data. But what if I want to grab only books by an author, date, etc.? I want to pass an author's name parameter, e.g. .../authors-name or /?author=name and return only those in the json response. Is this possible?
views.py
def get_books(request):
response = requests.get('http://localhost:8090/book/list/').json()
return render(request, 'books.html', {'response':response})
So is there a way to filter like a model object?
I can think of three ways of doing this:
Python's filter could be used with a bit of additional code.
QueryableList, which is the closest to an ORM for lists I've seen.
query-filter, which takes a more functional approach.
1. Build-in filter function
You can write a function that returns functions that tell you whether a list element is a match and the pass the generated function into filter.
def filter_pred_factory(**kwargs):
def predicate(item):
for key, value in kwargs.items():
if key not in item or item[key] != value:
return False
return True
return predicate
def get_books(request):
books_data = requests.get('http://localhost:8090/book/list/').json()
pred = filter_pred_factory(**request.GET)
data_filter = filter(pred, books_data)
# data_filter is cast to a list as a precaution
# because it is a filter object,
# which can only be iterated through once before it's exhausted.
filtered_data = list(data_filter)
return render(request, 'books.html', {'books': filtered_data})
2. QueryableList
QueryableList would achieve the same as the above, with some extra features. As well as /books?isbn=1933988673, you could use queries like /books?longDescription__icontains=linux. You can find other functionality here
from QueryableList import QueryableListDicts
def get_books(request):
books_data = requests.get('http://localhost:8090/book/list/').json()
queryable_books = QueryableListDicts(books_data)
filtered_data = queryable_books.filter(**request.GET)
return render(request, 'books.html', {'books':filtered_data})
3. query-filter
query-filter has similar features but doesn't copy the object-orient approach of an ORM.
from query_filter import q_filter, q_items
def get_books(request):
books_data = requests.get('http://localhost:8090/book/list/').json()
data_filter = q_filter(books_data, q_items(**request.GET))
# filtered_data is cast to a list as a precaution
# because q_filter returns a filter object,
# which can only be iterated through once before it's exhausted.
filtered_data = list(data_filter)
return render(request, 'books.html', {'books': filtered_data})
It's worth mentioning that I wrote query-filter.

How to dump all results of a API request when there is a page limit?

I am using an API to pull data from a url, however the API has a pagination limit. It goes like:
Page (default is 1 and it's the page number you want to retrieve)
Per_page (default is 100 and it's the maximum number of results returned in the response(max=500))
I have a script which I can get the results of a page or per page but I want to automate it. I want to be able to loop through all the pages or per_page(500) and load it in to a json file.
Here is my code that can get 500 results per_page:
import json, pprint
import requests
url = "https://my_api.com/v1/users?per_page=500"
header = {"Authorization": "Bearer <my_api_token>"}
s = requests.Session()
s.proxies = {"http": "<my_proxies>", "https": "<my_proxies>" }
resp = s.get(url, headers=header, verify=False)
raw=resp.json()
for x in raw:
print(x)
The output is 500 but is there a way to keep going and pull the results starting from where it left off? Or even go by page and get all the data per page until there's no data in a page?
It will be helpful, if you present a sample response from your API.
If the API is equipped properly, there will be a next property in a given response that leads you to the next page.
You can then keep calling the API with the link given in the next recursively. On the last page, there will be no next in the Link header.
resp.links["next"]["url"] will give you the URL to the next page.
For example, the GitHub API has next, last, first, and prev properties.
To put it into code, first, you need to turn your code into functions.
Given that there is a maximum of 500 results per page, it implies you are extracting a list of data of some sort from the API. Often, these data are returned in a list somewhere inside raw.
For now, let's assume you want to extract all elements inside a list at raw.get('data').
import requests
header = {"Authorization": "Bearer <my_api_token>"}
results_per_page = 500
def compose_url():
return (
"https://my_api.com/v1/users"
+ "?per_page="
+ str(results_per_page)
+ "&page_number="
+ "1"
)
def get_result(url=None):
if url_get is None:
url_get = compose_url()
else:
url_get = url
s = requests.Session()
s.proxies = {"http": "<my_proxies>", "https": "<my_proxies>"}
resp = s.get(url_get, headers=header, verify=False)
# You may also want to check the status code
if resp.status_code != 200:
raise Exception(resp.status_code)
raw = resp.json() # of type dict
data = raw.get("data") # of type list
if not "url" in resp.links.get("next"):
# We are at the last page, return data
return data
# Otherwise, recursively get results from the next url
return data + get_result(resp.links["next"]["url"]) # concat lists
def main():
# Driver function
data = get_result()
# Then you can print the data or save it to a file
if __name__ == "__main__":
# Now run the driver function
main()
However, if there isn't a proper Link header, I see 2 solutions:
(1) recursion and (2) loop.
I'll demonstrate recursion.
As you have mentioned, when there is pagination in API responses, i.e. when there is a limit of maximum number of results per page, there is often a query parameter called page number or start index of some sort to indicate which "page" you are querying, so we'll utilize the page_number parameter in the code.
The logic is:
Given a HTTP request response, if there is less than 500 results, it means there is no more pages. Return the results.
If there are 500 results in a given response, it means there's probably another page, so we advance page_number by 1 and do a recursion (by calling the function itself) and concatenate with previous results.
import requests
header = {"Authorization": "Bearer <my_api_token>"}
results_per_page = 500
def compose_url(results_per_page, current_page_number):
return (
"https://my_api.com/v1/users"
+ "?per_page="
+ str(results_per_page)
+ "&page_number="
+ str(current_page_number)
)
def get_result(current_page_number):
s = requests.Session()
s.proxies = {"http": "<my_proxies>", "https": "<my_proxies>"}
url = compose_url(results_per_page, current_page_number)
resp = s.get(url, headers=header, verify=False)
# You may also want to check the status code
if resp.status_code != 200:
raise Exception(resp.status_code)
raw = resp.json() # of type dict
data = raw.get("data") # of type list
# If the length of data is smaller than results_per_page (500 of them),
# that means there is no more pages
if len(data) < results_per_page:
return data
# Otherwise, advance the page number and do a recursion
return data + get_result(current_page_number + 1) # concat lists
def main():
# Driver function
data = get_result(1)
# Then you can print the data or save it to a file
if __name__ == "__main__":
# Now run the driver function
main()
If you truly want to store the raw responses, you can. However, you'll still need to check the number of results in a given response. The logic is similar. If a given raw contains 500 results, it means there is probably another page. We advance the page number by 1 and do a recursion.
Let's still assume raw.get('data') is the list whose length is the number of results.
Because JSON/dictionary files cannot be simply concatenated, you can store raw (which is a dictionary) of each page into a list of raws. You can then parse and synthesize the data in whatever way you want.
Use the following get_result function:
def get_result(current_page_number):
s = requests.Session()
s.proxies = {"http": "<my_proxies>", "https": "<my_proxies>"}
url = compose_url(results_per_page, current_page_number)
resp = s.get(url, headers=header, verify=False)
# You may also want to check the status code
if resp.status_code != 200:
raise Exception(resp.status_code)
raw = resp.json() # of type dict
data = raw.get("data") # of type list
if len(data) == results_per_page:
return [raw] + get_result(current_page_number + 1) # concat lists
return [raw] # convert raw into a list object on the fly
As for the loop method, the logic is similar to recursion. Essentially, you will call the get_result() function a number of times, collect the results, and break early when the furthest page contains less than 500 results.
If you know the total number of results in advance, you can simply the run the loop for a predetermined number of times.
Do you follow? Do you have any further questions?
(I'm a little confused by what you mean by "load it into a JSON file". Do you mean saving the final raw results into a JSON file? Or are you referring to the .json() method in resp.json()? In that case, you don't need import json to do resp.json(). The .json() method on resp is actually part of the requests module.
On a bonus point, you can make your HTTP requests asynchronous, but this is slightly beyond the scope of your original question.
P.s. I'm happy to learn what other solutions, perhaps more elegant ones, that people use.

Why must use DataParallel when testing?

Train on the GPU, num_gpus is set to 1:
device_ids = list(range(num_gpus))
model = NestedUNet(opt.num_channel, 2).to(device)
model = nn.DataParallel(model, device_ids=device_ids)
Test on the CPU:
model = NestedUNet_Purn2(opt.num_channel, 2).to(dev)
device_ids = list(range(num_gpus))
model = torch.nn.DataParallel(model, device_ids=device_ids)
model_old = torch.load(path, map_location=dev)
pretrained_dict = model_old.state_dict()
model_dict = model.state_dict()
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict}
model_dict.update(pretrained_dict)
model.load_state_dict(model_dict)
This will get the correct result, but when I delete:
device_ids = list(range(num_gpus))
model = torch.nn.DataParallel(model, device_ids=device_ids)
the result is wrong.
nn.DataParallel wraps the model, where the actual model is assigned to the module attribute. That also means that the keys in the state dict have a module. prefix.
Let's look at a very simplified version with just one convolution to see the difference:
class NestedUNet(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1)
model = NestedUNet()
model.state_dict().keys() # => odict_keys(['conv1.weight', 'conv1.bias'])
# Wrap the model in DataParallel
model_dp = nn.DataParallel(model, device_ids=range(num_gpus))
model_dp.state_dict().keys() # => odict_keys(['module.conv1.weight', 'module.conv1.bias'])
The state dict you saved with nn.DataParallel does not line up with the regular model's state. You are merging the current state dict with the loaded state dict, that means that the loaded state is ignored, because the model does not have any attributes that belong to the keys and instead you are left with the randomly initialised model.
To avoid making that mistake, you shouldn't merge the state dicts, but rather directly apply it to the model, in which case there will be an error if the keys don't match.
RuntimeError: Error(s) in loading state_dict for NestedUNet:
Missing key(s) in state_dict: "conv1.weight", "conv1.bias".
Unexpected key(s) in state_dict: "module.conv1.weight", "module.conv1.bias".
To make the state dict that you have saved compatible, you can strip off the module. prefix:
pretrained_dict = {key.replace("module.", ""): value for key, value in pretrained_dict.items()}
model.load_state_dict(pretrained_dict)
You can also avoid this issue in the future by unwrapping the model from nn.DataParallel before saving its state, i.e. saving model.module.state_dict(). So you can always load the model first with its state and then later decide to put it into nn.DataParallel if you wanted to use multiple GPUs.
You trained your model using DataParallel and saved it. So, the model weights were stored with a module. prefix. Now, when you load without DataParallel, you basically are not loading any model weights (the model has random weights). As a result, the model predictions are wrong.
I am giving an example.
model = nn.Linear(2, 4)
model = torch.nn.DataParallel(model, device_ids=device_ids)
model.state_dict().keys() # => odict_keys(['module.weight', 'module.bias'])
On the other hand,
another_model = nn.Linear(2, 4)
another_model.state_dict().keys() # => odict_keys(['weight', 'bias'])
See the difference in the OrderedDict keys.
So, in your code, the following three-line works but no model weights are loaded.
pretrained_dict = model_old.state_dict()
model_dict = model.state_dict()
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict}
Here, model_dict has keys without the module. prefix but pretrained_dict has when you do not use DataParalle. So, essentially pretrained_dict is empty when DataParallel is not used.
Solution: If you want to avoid using DataParallel, or you can load the weights file, create a new OrderedDict without the module prefix, and load it back.
Something like the following would work for your case without using DataParallel.
# original saved file with DataParallel
model_old = torch.load(path, map_location=dev)
# create new OrderedDict that does not contain `module.`
from collections import OrderedDict
new_state_dict = OrderedDict()
for k, v in model_old.items():
name = k[7:] # remove `module.`
new_state_dict[name] = v
# load params
model.load_state_dict(new_state_dict)

how to chunk a csv (dict)reader object in python 3.2?

I try to use Pool from the multiprocessing module to speed up reading in large csv files. For this, I adapted an example (from py2k), but it seems like the csv.dictreader object has no length. Does it mean I can only iterate over it? Is there a way to chunk it still?
These questions seemed relevant, but did not really answer my question:
Number of lines in csv.DictReader,
How to chunk a list in Python 3?
My code tried to do this:
source = open('/scratch/data.txt','r')
def csv2nodes(r):
strptime = time.strptime
mktime = time.mktime
l = []
ppl = set()
for row in r:
cell = int(row['cell'])
id = int(row['seq_ei'])
st = mktime(strptime(row['dat_deb_occupation'],'%d/%m/%Y'))
ed = mktime(strptime(row['dat_fin_occupation'],'%d/%m/%Y'))
# collect list
l.append([(id,cell,{1:st,2: ed})])
# collect separate sets
ppl.add(id)
return (l,ppl)
def csv2graph(source):
r = csv.DictReader(source,delimiter=',')
MG=nx.MultiGraph()
l = []
ppl = set()
# Remember that I use integers for edge attributes, to save space! Dic above.
# start: 1
# end: 2
p = Pool(processes=4)
node_divisor = len(p._pool)*4
node_chunks = list(chunks(r,int(len(r)/int(node_divisor))))
num_chunks = len(node_chunks)
pedgelists = p.map(csv2nodes,
zip(node_chunks))
ll = []
for l in pedgelists:
ll.append(l[0])
ppl.update(l[1])
MG.add_edges_from(ll)
return (MG,ppl)
From the csv.DictReader documentation (and the csv.reader class it subclasses), the class returns an iterator. The code should have thrown a TypeError when you called len().
You can still chunk the data, but you'll have to read it entirely into memory. If you're concerned about memory you can switch from csv.DictReader to csv.reader and skip the overhead of the dictionaries csv.DictReader creates. To improve readability in csv2nodes(), you can assign constants to address each field's index:
CELL = 0
SEQ_EI = 1
DAT_DEB_OCCUPATION = 4
DAT_FIN_OCCUPATION = 5
I also recommend using a different variable than id, since that's a built-in function name.