Use string variable as key in nested json in Python - json

I have a JSON object, data, I need to modify.
Right now, I am modifying the object as follows:
data['Foo']['Bar'] = 'ExampleString'
Is it possible to use a string variable to do the indexing?
s = 'Foo/Bar'
data[s.split('/')] = 'ExampleString'
The above code does not work.
How can I achieve the behavior I am after?
NB : I am looking for a solution which supports arbitrary number of key "levels", for instance the string variable may be Foo/Bar/Baz, or Foo/Bar/Baz/Foo/Bar, which would correspond to data['Foo']['Bar']['Baz'] and data['Foo']['Bar']['Baz']['Foo']['Bar'].

Without changing completely the data class you want to use, this might be easisest:
def jsonSetPath(jobj, path, item):
prev = None
y = jobj
for x in path.split('/'):
prev = y
y = y[x]
prev[x] = item
A wrapper Python to descend iteratively into the object. Then you can use
jsonSetPath(data, 'foo/obj', 3)
normally. You can add this functionality to your dictionary by inheriting dict if you prefer:
class JsonDict(dict):
def __getitem__(self, path):
# We only accept strings in this dictionary
y = self
for x in path.split('/'):
y = dict.get(y, x)
return y
def __setitem__(self, path, item):
# We only accept strings in this dictionary
y = self
prev = None
for x in path.split('/'):
prev = y
y = dict.get(y, x)
prev[x] = item
note using UserDict from collections may be advised, but seems to much of a hassle without converting all the inner dictionaries to user dictionaries. Now you wrap your data (data = JsonDict(data)) and use it as you wanted. If you want to use non-strings as your keys, you need to handle that (though I am not sure that makes sense in this specific dictionary implementation).
Note only the "outer" dictionary is your custom dictionary. If the use case is more advanced you would need to convert all the inner ones as well, and then you might as well use UserDictionary.

A very naive solution to get you going in the correct direction.
You need to add error handling, for example what happens if somewhere a long the path a key is missing? You can either bail out or add a new dict on the fly.
def update(path, d, value):
for nested_key in path.split('/'):
temp = d[nested_key]
if isinstance(temp, dict):
d = d[nested_key]
d[nested_key] = value
one_level_path = 'Foo/Bar'
one_level_dict = {'Foo': {'Bar': None}}
print(one_level_dict)
update(one_level_path, one_level_dict, 1)
print(one_level_dict)
two_level_path = 'Foo/Bar/Baz'
two_level_dict = {'Foo': {'Bar': {'Baz': None}}}
print(two_level_dict)
update(two_level_path, two_level_dict, 1)
print(two_level_dict)
Outputs
{'Foo': {'Bar': None}}
{'Foo': {'Bar': 1}}
{'Foo': {'Bar': {'Baz': None}}}
{'Foo': {'Bar': {'Baz': 1}}}

Using recursion:
x = {'foo': {'in':{'inner':9}}}
path = "foo/in/inner";
def setVal(obj,pathList,val):
if len(pathList) == 1:
obj[pathList[0]] = val
else:
return setVal(obj[pathList[0]],pathList[1:],val)
print(x)
setVal(x,path.split('/'),10)
print(x)

Related

Why must use DataParallel when testing?

Train on the GPU, num_gpus is set to 1:
device_ids = list(range(num_gpus))
model = NestedUNet(opt.num_channel, 2).to(device)
model = nn.DataParallel(model, device_ids=device_ids)
Test on the CPU:
model = NestedUNet_Purn2(opt.num_channel, 2).to(dev)
device_ids = list(range(num_gpus))
model = torch.nn.DataParallel(model, device_ids=device_ids)
model_old = torch.load(path, map_location=dev)
pretrained_dict = model_old.state_dict()
model_dict = model.state_dict()
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict}
model_dict.update(pretrained_dict)
model.load_state_dict(model_dict)
This will get the correct result, but when I delete:
device_ids = list(range(num_gpus))
model = torch.nn.DataParallel(model, device_ids=device_ids)
the result is wrong.
nn.DataParallel wraps the model, where the actual model is assigned to the module attribute. That also means that the keys in the state dict have a module. prefix.
Let's look at a very simplified version with just one convolution to see the difference:
class NestedUNet(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1)
model = NestedUNet()
model.state_dict().keys() # => odict_keys(['conv1.weight', 'conv1.bias'])
# Wrap the model in DataParallel
model_dp = nn.DataParallel(model, device_ids=range(num_gpus))
model_dp.state_dict().keys() # => odict_keys(['module.conv1.weight', 'module.conv1.bias'])
The state dict you saved with nn.DataParallel does not line up with the regular model's state. You are merging the current state dict with the loaded state dict, that means that the loaded state is ignored, because the model does not have any attributes that belong to the keys and instead you are left with the randomly initialised model.
To avoid making that mistake, you shouldn't merge the state dicts, but rather directly apply it to the model, in which case there will be an error if the keys don't match.
RuntimeError: Error(s) in loading state_dict for NestedUNet:
Missing key(s) in state_dict: "conv1.weight", "conv1.bias".
Unexpected key(s) in state_dict: "module.conv1.weight", "module.conv1.bias".
To make the state dict that you have saved compatible, you can strip off the module. prefix:
pretrained_dict = {key.replace("module.", ""): value for key, value in pretrained_dict.items()}
model.load_state_dict(pretrained_dict)
You can also avoid this issue in the future by unwrapping the model from nn.DataParallel before saving its state, i.e. saving model.module.state_dict(). So you can always load the model first with its state and then later decide to put it into nn.DataParallel if you wanted to use multiple GPUs.
You trained your model using DataParallel and saved it. So, the model weights were stored with a module. prefix. Now, when you load without DataParallel, you basically are not loading any model weights (the model has random weights). As a result, the model predictions are wrong.
I am giving an example.
model = nn.Linear(2, 4)
model = torch.nn.DataParallel(model, device_ids=device_ids)
model.state_dict().keys() # => odict_keys(['module.weight', 'module.bias'])
On the other hand,
another_model = nn.Linear(2, 4)
another_model.state_dict().keys() # => odict_keys(['weight', 'bias'])
See the difference in the OrderedDict keys.
So, in your code, the following three-line works but no model weights are loaded.
pretrained_dict = model_old.state_dict()
model_dict = model.state_dict()
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict}
Here, model_dict has keys without the module. prefix but pretrained_dict has when you do not use DataParalle. So, essentially pretrained_dict is empty when DataParallel is not used.
Solution: If you want to avoid using DataParallel, or you can load the weights file, create a new OrderedDict without the module prefix, and load it back.
Something like the following would work for your case without using DataParallel.
# original saved file with DataParallel
model_old = torch.load(path, map_location=dev)
# create new OrderedDict that does not contain `module.`
from collections import OrderedDict
new_state_dict = OrderedDict()
for k, v in model_old.items():
name = k[7:] # remove `module.`
new_state_dict[name] = v
# load params
model.load_state_dict(new_state_dict)

In Python, how to concisely replace nested values in json data?

This is an extension to In Python, how to concisely get nested values in json data?
I have data loaded from JSON and am trying to replace arbitrary nested values using a list as input, where the list corresponds to the names of successive children. I want a function replace_value(data,lookup,value) that replaces the value in the data by treating each entry in lookup as a nested child.
Here is the structure of what I'm trying to do:
json_data = {'alldata':{'name':'CAD/USD','TimeSeries':{'dates':['2018-01-01','2018-01-02'],'rates':[1.3241,1.3233]}}}
def replace_value(data,lookup,value):
DEFINITION
lookup = ['alldata','TimeSeries','rates']
replace_value(json_data,lookup,[2,3])
# The following should return [2,3]
print(json_data['alldata']['TimeSeries']['rates'])
I was able to make a start with get_value(), but am stumped about how to do replacement. I'm not fixed to this code structure, but want to be able to programatically replace a value in the data given the list of successive children and the value to replace.
Note: it is possible that lookup can be of length 1
Follow the lookups until we're second from the end, then assign the value to the last lookup in the current object
def get_value(data,lookup): # Or whatever definition you like
res = data
for item in lookup:
res = res[item]
return res
def replace_value(data, lookup, value):
obj = get_value(data, lookup[:-1])
obj[lookup[-1]] = value
json_data = {'alldata':{'name':'CAD/USD','TimeSeries':{'dates':['2018-01-01','2018-01-02'],'rates':[1.3241,1.3233]}}}
lookup = ['alldata','TimeSeries','rates']
replace_value(json_data,lookup,[2,3])
print(json_data['alldata']['TimeSeries']['rates']) # [2, 3]
If you're worried about the list copy lookup[:-1], you can replace it with an iterator slice:
from itertools import islice
def replace_value(data, lookup, value):
it = iter(lookup)
slice = islice(it, len(lookup)-1)
obj = get_value(data, slice)
final = next(it)
obj[final] = value
You can obtain the parent to the final sub-dict first, so that you can reference it to alter the value of that sub-dict under the final key:
def replace_value(data, lookup, replacement):
*parents, key = lookup
for parent in parents:
data = data[parent]
data[key] = replacement
so that:
json_data = {'alldata':{'name':'CAD/USD','TimeSeries':{'dates':['2018-01-01','2018-01-02'],'rates':[1.3241,1.3233]}}}
lookup = ['alldata','TimeSeries','rates']
replace_value(json_data,lookup,[2,3])
print(json_data['alldata']['TimeSeries']['rates'])
outputs:
[2, 3]
Once you have get_value
get_value(json_data, lookup[:-1])[lookup[-1]] = value

using the right side of the disjoint union properly

what's the best way to turn a Right[List] into a List
I will parse a Json String like so
val parsed_states = io.circe.parser.decode[List[List[String]]](source)
And that will create an value equivalent to this
val example_data = Right(List(List("NAME", "state"), List("Alabama", "01"), List("Alaska", "02"), List("Arizona", "04")))
I'm trying to grok Right, Left, Either and implement the best way to get a list of StateName, StateValue pairs out of that list above.
I see that any of these ways will give me what I need (while dropping the header):
val parsed_states = example_data.toSeq(0).tail
val parsed_states = example_data.getOrElse(<ProbUseNoneHere>).iterator.to(Seq).tail
val parsed_states = example_data.getOrElse(<ProbUseNoneHere>).asInstanceOf[Seq[List[String]]].tail
I guess I'm wondering if I should do it one way or another based on the possible behavior upstream coming out of io.circe.parser.decode or am I overthinking this. I'm new to the Right, Left, Either paradigm and not finding terribly many helpful examples.
in reply to #slouc
trying to connect the dots from your answer as they apply to this use case. so something like this?
def blackBox: String => Either[Exception, List[List[String]]] = (url:String) => {
if (url == "passalong") {
Right(List(List("NAME", "state"), List("Alabama", "01"), List("Alaska", "02"), List("Arizona", "04")))
}
else Left(new Exception(s"This didn't work bc blackbox didn't parse ${url}"))
}
//val seed = "passalong"
val seed = "notgonnawork"
val xx: Either[Exception, List[List[String]]] = blackBox(seed)
def ff(i: List[List[String]]) = i.tail
val yy = xx.map(ff)
val zz = xx.fold(
_ => throw new Exception("<need info here>"),
i => i.tail)
The trick is in not getting state name / state value pairs out of the Either. They should be kept inside. If you want to, you can transform the Either type into something else (e.g. an Option by discarding whatever you possibly had on the left side), but don't destroy the effect. Something should be there to show that decoding could have failed; it can be an Either, Option, Try, etc. Eventually you will process left and right case accordingly, but this should happen as late as possible.
Let's take the following trivial example:
val x: Either[String, Int] = Right(42)
def f(i: Int) = i + 1
You might argue that you need to get the 42 out of the Right so that you can pass it to f. But that's not correct. Let's rewrite the example:
val x: Either[String, Int] = someFunction()
Now what? We have no idea whether we have a Left or a Right in value x, so we can't "get it out". Which integer would you obtain in case it's a Left? (if you really do have an integer value to use in that case, that's fair enough, and I will address that use case a bit later)
What you need to do instead is keep the effect (in this case Either), and you need to continue working in the context of that effect. It's there to show that there was a point in your program (in this case someFunction(), or decoding in your original question) that might have gone wrong.
So if you want to apply f to your potential integer, you need to map the effect with it (we can do that because Either is a functor, but that's a detail which probably exceeds the scope of this answer):
val x: Either[String, Int] = Right(42)
def f(i: Int) = i + 1
val y = x.map(value => f(value)) // Right(43)
val y = x.map(f) // shorter, point-free notation
and
val x: Either[String, Int] = someFunction()
def f(i: Int) = i + 1
// either a Left with some String, or a Right with some integer increased by 1
val y = x.map(f)
Then, at the very end of the chain of computations, you can handle the Left and Right cases; for example, if you were processing an HTTP request, then in case of Left you might return a 500, and in case of Right return a 200.
To address the use case with default value mentioned earlier - if you really want to do that, get rid of the Left and in that case resolve into some value (e.g. 0), then you can use fold:
def f(i: Int) = i + 1
// if x = Left, then z = 0
// if x = Right, then z = x + 1
val z = x.fold(_ => 0, i => i + 1)

Python Return multiple Variables from a function

Not so much of a question as a valuable observation for people using python.
Unlike the majority of other programming languages, you can return multiple variable from a function
without dealing with objects, lists etc.
simply put
return ReturnValue1, ReturnValue2, ReturnValue3
to return however many you wish.
and to retrieve them:
ReturnValue1, ReturnValue2, ReturnValue3 = functionName(parameters)
But remember to do it in order just like assigning a parameter for a function.
Cheers
As I am not able to just comment on your "Question", I have to put this into an answer:
To be precise, the return-value will be a tuple. So technically you are not returning multiple values, but an instance of the class tuple, containing those exact values. This provides the opportunity to receive those values in quite a lot of different ways:
def f():
return 1, 2, 3
one, two, three = f() # one = 1, two = 2, three = 3
all_three_values = f() # all_three_values = (1, 2, 3)
a, *b = f() # a = 1, b = [2, 3]
assert isinstance(all_three_values, tuple) # True
This may help you:
lst = ['a','s','d','f','w','e']
def list2variables(lst):
di = {}
for i in range(len(lst)): # capture index to form dict
di[lst[i]]=lst[i]
for k, v in di.items(): # move elements as global variable
globals()[k] = v
return globals() # return so that can be used in future
little change can be done in function to take dict as argument too

how to chunk a csv (dict)reader object in python 3.2?

I try to use Pool from the multiprocessing module to speed up reading in large csv files. For this, I adapted an example (from py2k), but it seems like the csv.dictreader object has no length. Does it mean I can only iterate over it? Is there a way to chunk it still?
These questions seemed relevant, but did not really answer my question:
Number of lines in csv.DictReader,
How to chunk a list in Python 3?
My code tried to do this:
source = open('/scratch/data.txt','r')
def csv2nodes(r):
strptime = time.strptime
mktime = time.mktime
l = []
ppl = set()
for row in r:
cell = int(row['cell'])
id = int(row['seq_ei'])
st = mktime(strptime(row['dat_deb_occupation'],'%d/%m/%Y'))
ed = mktime(strptime(row['dat_fin_occupation'],'%d/%m/%Y'))
# collect list
l.append([(id,cell,{1:st,2: ed})])
# collect separate sets
ppl.add(id)
return (l,ppl)
def csv2graph(source):
r = csv.DictReader(source,delimiter=',')
MG=nx.MultiGraph()
l = []
ppl = set()
# Remember that I use integers for edge attributes, to save space! Dic above.
# start: 1
# end: 2
p = Pool(processes=4)
node_divisor = len(p._pool)*4
node_chunks = list(chunks(r,int(len(r)/int(node_divisor))))
num_chunks = len(node_chunks)
pedgelists = p.map(csv2nodes,
zip(node_chunks))
ll = []
for l in pedgelists:
ll.append(l[0])
ppl.update(l[1])
MG.add_edges_from(ll)
return (MG,ppl)
From the csv.DictReader documentation (and the csv.reader class it subclasses), the class returns an iterator. The code should have thrown a TypeError when you called len().
You can still chunk the data, but you'll have to read it entirely into memory. If you're concerned about memory you can switch from csv.DictReader to csv.reader and skip the overhead of the dictionaries csv.DictReader creates. To improve readability in csv2nodes(), you can assign constants to address each field's index:
CELL = 0
SEQ_EI = 1
DAT_DEB_OCCUPATION = 4
DAT_FIN_OCCUPATION = 5
I also recommend using a different variable than id, since that's a built-in function name.