Two stages transfer learning - deep-learning

I have used MobilenetV2 as architecture with "imagenet" weights to classify between 4 classes of Xrays. I have a very good accuracy so I have saved these weights (Bioiatriki_project.h5). I want to further use the weights for another classification task but without 4 classes this time. I want to classify two classes .So my code in this second part for creating the model is
def create_model(pretrained=True):
mobile_model = MobileNetV2(
weights='/content/drive/MyDrive/Bioiatriki_project.h5',
input_shape=input_img_size,
alpha=1,
include_top=False)
print("mobileNetV2 has {} layers".format(len(mobile_model.layers)))
if pretrained:
for layer in mobile_model.layers[:-50]:
layer.trainable=False
for layer in mobile_model.layers[-50:]:
layer.trainable=True
else:
for layer in mobile_model.layers:
layer.trainable = True
model = mobile_model.layers[-3].output
model = layers.GlobalAveragePooling2D()(model)
model = layers.Dense(num_classes, activation="softmax", kernel_initializer='uniform')(model)
model = Model(inputs=mobile_model.input, outputs=model)
return model
This throws me this error:
ValueError: Weight count mismatch for layer #103 (named Conv_1_bn in the current model, dense in the save file). Layer expects 4 weight(s). Received 2 saved weight(s)
So how can I fix this?

Related

Arbitrary shaped Feedforward Neural Network in Pytorch

I am making a script that has some generative aspect to it, and I need to generate arbitrary shaped feedforward NNs. The idea is to pass a list With the number of number of neurons in each layer, and the number of layers is determined by the length of the list:
shape = [784,64,64,64,10]
I tried something like this:
shapenn = [784,64,64,64,10]
class Net(nn.Module):
def __init__(self, shapenn):
super().__init__()
self.shapenn = shapenn
self.fcl = [] # list with fully conected leyers
for i in range(len(act_funs)):
self.fcl.append(nn.Linear(self.nnarch[i],self.nnarch[i+1]))
net = Net(shapenn)
While the fully connected layers are created correctly in the list fcl, net is not initialized properly for example it has not net.parameters().
I am sure there is a correct way to do this, thank you very much in advance.
You need to use nn.ModuleList in place of the built-in python list (similarly use nn.ModuleDict in place of python dictionaries). These behave like a normal list except that they must only contain instances that subclasses nn.Module and using them signals that the modules contained in the list should be considered submodules of your module. For example
import torch.nn as nn
class Net(nn.Module):
def __init__(self):
super().__init__()
self.fcl = nn.ModuleList()
for i in range(5):
self.fcl.append(nn.Linear(10, 10))
net = Net()
print([name for name, val in net.named_parameters()])
prints
['fcl.0.weight', 'fcl.0.bias', 'fcl.1.weight', 'fcl.1.bias', 'fcl.2.weight', 'fcl.2.bias', 'fcl.3.weight', 'fcl.3.bias', 'fcl.4.weight', 'fcl.4.bias']

How to extract images, labels from csv file and create a trainset using torch?

I downloaded a dataset for facial key point detection the image and the labels were in a CSV file I extracted it using pandas but I don't know how to convert it into a tensor and load it into a data loader for training.
dataframe = pd.read_csv("training_facial_keypoints.csv")
dataframe['Image'] = dataframe['Image'].apply(lambda i: np.fromstring(i, sep=' '))
dataframe= dataframe.dropna()
images_array = np.vstack(dataframe['Image'].values)/255.0
images_array = images_array.astype(np.float32)
images_array = images_array.reshape(-1, 96, 96, 1)
print(images_array.shape)
labels_array = dataframe[dataframe.columns[:-1]].values
labels_array = (labels_array-48)/48
labels_array = labels_array.astype(np.float32)
I have the images and labels in two arrays. How do I create a training set from this and use transforms.
Then load it using a dataloader.
Create a subclass of torch.utils.data.Dataset, fill it with your data.
You can pass desired torchvision.transforms to it and apply them to your data in __getitem__(self, index).
Than you can pass it to torch.utils.data.DataLoader which allows multi-threaded loading of data.
And PyTorch has an overwhelming documentation you should first refer to.

Getting alignment/attention during translation in OpenNMT-py

Does anyone know how to get the alignments weights when translating in Opennmt-py? Usually the only output are the resulting sentences and I have tried to find a debugging flag or similar for the attention weights. So far, I have been unsuccessful.
I'm not sure if this is a new feature, since I did not come across this when looking for alignments a few months back, but onmt seems to have added a flag -report_align to output word alignments along with the translation.
https://opennmt.net/OpenNMT-py/FAQ.html#raw-alignments-from-averaging-transformer-attention-heads
Excerpt from opennnmt.net -
Currently, we support producing word alignment while translating for Transformer based models. Using -report_align when calling translate.py will output the inferred alignments in Pharaoh format. Those alignments are computed from an argmax on the average of the attention heads of the second to last decoder layer.
You can get the attention matrices. Note that it is not the same as alignment which is a term from statistical (not neural) machine translation.
There is a thread on github discussing it. Here is a snippet from the discussion. When you get the translations from the mode, the attentions are in the attn field.
import onmt
import onmt.io
import onmt.translate
import onmt.ModelConstructor
from collections import namedtuple
# Load the model.
Opt = namedtuple('Opt', ['model', 'data_type', 'reuse_copy_attn', "gpu"])
opt = Opt("PATH_TO_SAVED_MODEL", "text", False, 0)
fields, model, model_opt = onmt.ModelConstructor.load_test_model(
opt, {"reuse_copy_attn" : False})
# Test data
data = onmt.io.build_dataset(
fields, "text", "PATH_TO_DATA", None, use_filter_pred=False)
data_iter = onmt.io.OrderedIterator(
dataset=data, device=0,
batch_size=1, train=False, sort=False,
sort_within_batch=True, shuffle=False)
# Translator
translator = onmt.translate.Translator(
model, fields, beam_size=5, n_best=1,
global_scorer=None, cuda=True)
builder = onmt.translate.TranslationBuilder(
data, translator.fields, 1, False, None)
batch = next(data_iter)
batch_data = translator.translate_batch(batch, data)
translations = builder.from_batch(batch_data)
translations[0].attn # <--- here are the attentions

Understanding Keras model architecture (node index of nested model)

This script defining a dummy model using a small nested model
from keras.layers import Input, Dense
from keras.models import Model
import keras
input_inner = Input(shape=(4,), name='input_inner')
output_inner = Dense(3, name='inner_dense')(input_inner)
inner_model = Model(inputs=input_inner, outputs=output_inner)
input = Input(shape=(5,), name='input')
x = Dense(4, name='dense_1')(input)
x = inner_model(x)
x = Dense(2, name='dense_2')(x)
output = keras.layers.concatenate([x, x], name='concat_1')
model = Model(inputs=input, outputs=output)
print(model.summary())
yields the following output
Layer (type) Output Shape Param # Connected to
====================================================================================================
input (InputLayer) (None, 5) 0
____________________________________________________________________________________________________
dense_1 (Dense) (None, 4) 24 input[0][0]
____________________________________________________________________________________________________
model_1 (Model) (None, 3) 15 dense_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense) (None, 2) 8 model_1[1][0]
____________________________________________________________________________________________________
concat_1 (Concatenate) (None, 4) 0 dense_2[0][0]
dense_2[0][0]
My question concerns the content of the Connected to column.
I understand that a layer can have multiple nodes.
The notation of this column is layer_name[node_index][tensor_index].
If we regard inner_model as a layer I would expect it to have only one node, so I would expect dense_2 to be connected to model_1[0][0]. But in reality it is connected to model_1[1][0]. Why is this the case?
1.Background
When you say:
If we regard inner_model as a layer I would expect it to have only one
node
This is true in the sense that it has only one node which is part of the network.
Consider the github repository of the model.summary function. The function that prints the connections is print_layer_summary_with_connections (line 76), and it considers only the nodes from relevant_nodes array. All the nodes that are not in this array are considered not part of the network, and so the function skips them. The relevant lines are lines 88-90:
if relevant_nodes and node not in relevant_nodes:
# node is not part of the current network
continue
2.Your model
Now let's see what happens with your particular model. First let us define relevant_nodes:
relevant_nodes = []
for v in model.nodes_by_depth.values():
relevant_nodes += v
The array relevant_nodes looks like:
[<keras.engine.topology.Node at 0x9dfa518>,
<keras.engine.topology.Node at 0x9dfa278>,
<keras.engine.topology.Node at 0x9d8bac8>,
<keras.engine.topology.Node at 0x9d8ba58>,
<keras.engine.topology.Node at 0x9d74518>]
However, when we print the inbound nodes at every layer, we will get:
for i in model.layers:
print(i.inbound_nodes)
[<keras.engine.topology.Node object at 0x0000000009D74518>]
[<keras.engine.topology.Node object at 0x0000000009D8BA58>]
[<keras.engine.topology.Node object at 0x0000000009D743C8>, <keras.engine.topology.Node object at 0x0000000009D8BAC8>]
[<keras.engine.topology.Node object at 0x0000000009DFA278>]
[<keras.engine.topology.Node object at 0x0000000009DFA518>]
You can see that there is exactly one node in the list above that does not appear in relevant_nodes. This is the node in position 0 in the third array:
<keras.engine.topology.Node object at 0x0000000009D743C8>
It was not considered a part of the model, and hence did not appear in relevant_nodes. The node in position 1 in this array does appear in relevant_nodes, and this is why you see it as model_1[1][0].
3.The reason
The reason for that is basically the line x=inner_model(input). Even If you run much smaller model, as the one below:
input_inner = Input(shape=(4,), name='input_inner')
output_inner = Dense(3, name='inner_dense')(input_inner)
inner_model = Model(inputs=input_inner, outputs=output_inner)
input = Input(shape=(5,), name='input')
output = inner_model(input)
model = Model(inputs=input, outputs=output)
You will see that relevant_nodes contains two elements, while via
for i in model.layers:
print(i.inbound_nodes)
you'll get three nodes.
This is because layer 1 (of the smaller model above) has two nodes, but only the second one is considered part of the model. In particular, if you print the input at each one of the nodes at layer 1 with layer.get_input_at(node_index), you'll get:
print(model.layers[1].get_input_at(0))
print(model.layers[1].get_input_at(1))
#prints
/input_inner
/input
4.Answers to the questions in the comment
1) Do you also know what this non-relevant node is good for / where it
comes from?
This node seems to be an "internal node" created during the application of inner_model. In particular, if you print the input and output shape at each one of the three nodes (in the small model above), you get:
nodes=[model.layers[0].inbound_nodes[0],model.layers[1].inbound_nodes[0],model.layers[1].inbound_nodes[1]]
for i in nodes:
print(i.input_shapes)
print(i.output_shapes)
print(" ")
#prints
[(None, 5)]
[(None, 5)]
[(None, 4)]
[(None, 3)]
[(None, 5)]
[(None, 3)]
so you could see that the shapes of the middle node (the one that does not appear in the list of relevant nodes) correspond to the shapes in inner_model.
2) Will an inner model with n output nodes always present them with node
indices 1 to n instead of 0 to n-1?
I am not sure if always, as I guess there are various possibilities to have several output nodes nodes, but if I consider the following quite natural generalization of the small model above, this is indeed the case:
input_inner = Input(shape=(4,), name='input_inner')
output_inner = Dense(3, name='inner_dense')(input_inner)
inner_model = Model(inputs=input_inner, outputs=output_inner)
input = Input(shape=(5,), name='input')
output = inner_model(input)
output = inner_model(output)
model = Model(inputs=input, outputs=output)
print(model.summary())
Here I just added output = inner_model(output) to the small model. The list of relevant nodes is
[<keras.engine.topology.Node at 0xd10c390>,
<keras.engine.topology.Node at 0xd10c9b0>,
<keras.engine.topology.Node at 0xd10ca20>]
and the list of all inbound nodes is
[<keras.engine.topology.Node object at 0x000000000D10CA20>]
[<keras.engine.topology.Node object at 0x000000000D10C588>, <keras.engine.topology.Node object at 0x000000000D10C9B0>, <keras.engine.topology.Node object at 0x000000000D10C390>]
Indeed the node indices are 1 and 2, as you mentioned in the comment. It will continue similarly if I add another output = inner_model(output), with node indices being 1,2,3 and so on.
Updated on Sep, 2020. The selected answer was a bit outdated (the link does not point to the right place), and not exactly answered the question: model_1[1][0]. Why is 1 in in [1][0] in the this the case? Here's what I found.
The code I played with is as below (I added some names for layers for better reading). You can copy and run to see the output info.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
input_inner = layers.Input(shape=(4,), name='inn_input')
output_inner = layers.Dense(3, name='inn_dense')(input_inner)
inner_model = keras.Model(inputs=input_inner, outputs=output_inner,name='inn_model')
inn_allLayers = inner_model.layers
# print(type(inn_allLayers))
print(inner_model.name,': total layer number:',len(inn_allLayers))
for i in inn_allLayers:
print(i.name, i)
print(len(i._inbound_nodes))
for n in i._inbound_nodes:
print(n.get_config())
print(n)
print('===================')
print('************************************************')
nest_input = layers.Input(shape=(5,), name='nest_input')
nest_d1_out = layers.Dense(4, name='nest_dense_1')(nest_input)
nest_m_out = inner_model(nest_d1_out)
nest_d2_out = layers.Dense(2, name='nest_dense_2')(nest_m_out)
nest_add_out = layers.concatenate([nest_d2_out, nest_d2_out], name='nest_concat')
model = keras.Model(inputs=nest_input, outputs=nest_add_out,name='nest_model')
inn_allLayers = inner_model.layers
# print(type(inn_allLayers))
print(inner_model.name,': total layer number:',len(inn_allLayers))
for i in inn_allLayers:
print(i.name, i)
print(len(i._inbound_nodes))
for n in i._inbound_nodes:
print(n.get_config())
print(n)
print('===================')
print('************************************************')
allLayers = model.layers
# print(type(allLayers))
print(model.name,': total layer number:',len(allLayers))
for i in allLayers:
print(i.name, i)
print(len(i._inbound_nodes))
for n in i._inbound_nodes:
print(n.get_config())
print(n)
print('===================')
for op in tf.get_default_graph().get_operations():
print(str(op.name))
1. [1][0] represents [node_index][tensor_index]
2. what is node_index?
Under tensorflow/python/keras/engine/base_layer.py, it's described in this class:
class KerasHistory(
collections.namedtuple('KerasHistory',
['layer', 'node_index', 'tensor_index'])):
"""Tracks the Layer call that created a Tensor, for Keras Graph Networks.
During construction of Keras Graph Networks, this metadata is added to
each Tensor produced as the output of a Layer, starting with an
`InputLayer`. This allows Keras to track how each Tensor was produced, and
this information is later retraced by the `keras.engine.Network` class to
reconstruct the Keras Graph Network.
Attributes:
layer: The Layer that produced the Tensor.
node_index: The specific call to the Layer that produced this Tensor. Layers
can be called multiple times in order to share weights. A new node is
created every time a Tensor is called.
tensor_index: The output index for this Tensor. Always zero if the Layer
that produced this Tensor only has one output. Nested structures of
Tensors are deterministically assigned an index via `nest.flatten`.
"""
# Added to maintain memory and performance characteristics of `namedtuple`
# while subclassing.
It says a Node is created each time a Tensor is called. To me, it is a bit vague. My understanding is that when a layer is called, it produces a Tensor, and different ways involve calling this layer will create multiple nodes (will show some print results later.)
3. How to print each node?
Under the same py file, there is this snippet:
# Create node, add it to inbound nodes.
Node(
self,
inbound_layers=inbound_layers,
node_indices=node_indices,
tensor_indices=tensor_indices,
input_tensors=input_tensors,
output_tensors=output_tensors,
arguments=arguments)
# Update tensor history metadata.
# The metadata attribute consists of
# 1) a layer instance
# 2) a node index for the layer
# 3) a tensor index for the node.
# The allows layer reuse (multiple nodes per layer) and multi-output
# or multi-input layers (e.g. a layer can return multiple tensors,
# and each can be sent to a different layer).
for i, tensor in enumerate(nest.flatten(output_tensors)):
tensor._keras_history = KerasHistory(self,
len(self._inbound_nodes) - 1, i)
The self refers Layer object. the info is recoded in each tensor's _keras_history and self._inbound_nodes attribute. Hence we can print exactly the node by print(layer._inbound_nodes[index_of_node].get_config() I already typed the runnable code in the code at the beginning.
(What is inbound and outbound nodes? It's big confusing by first look, but if you imagine each node is an arrow pointing from one layer to another layer, it might be easier. The code description is below)
class Node(object):
"""A `Node` describes the connectivity between two layers.
Each time a layer is connected to some new input,
a node is added to `layer._inbound_nodes`.
Each time the output of a layer is used by another layer,
a node is added to `layer._outbound_nodes`.
Arguments:
outbound_layer: the layer that takes
`input_tensors` and turns them into `output_tensors`
(the node gets created when the `call`
method of the layer was called).
inbound_layers: a list of layers, the same length as `input_tensors`,
the layers from where `input_tensors` originate.
node_indices: a list of integers, the same length as `inbound_layers`.
`node_indices[i]` is the origin node of `input_tensors[i]`
(necessary since each inbound layer might have several nodes,
e.g. if the layer is being shared with a different data stream).
tensor_indices: a list of integers,
the same length as `inbound_layers`.
`tensor_indices[i]` is the index of `input_tensors[i]` within the
output of the inbound layer
(necessary since each inbound layer might
have multiple tensor outputs, with each one being
independently manipulable).
input_tensors: list of input tensors.
output_tensors: list of output tensors.
arguments: dictionary of keyword arguments that were passed to the
`call` method of the layer at the call that created the node.
`node_indices` and `tensor_indices` are basically fine-grained coordinates
describing the origin of the `input_tensors`.
A node from layer A to layer B is added to:
- A._outbound_nodes
- B._inbound_nodes
"""
4. Observe node creation.
You might notice there are two exactly same print blocks for inner_model in the code: one is before nested model is built, one is after.
The output is as below:
inn_model : total layer number: 2
inn_input <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7fd1c6755780>
1
{'outbound_layer': 'inn_input', 'inbound_layers': [], 'node_indices': [], 'tensor_indices': []}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2e75e10>
===================
inn_dense <tensorflow.python.keras.layers.core.Dense object at 0x7fd1d2e75e80>
1
{'outbound_layer': 'inn_dense', 'inbound_layers': 'inn_input', 'node_indices': 0, 'tensor_indices': 0}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2e92550>
===================
************************************************
inn_model : total layer number: 2
inn_input <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7fd1c6755780>
1
{'outbound_layer': 'inn_input', 'inbound_layers': [], 'node_indices': [], 'tensor_indices': []}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2e75e10>
===================
inn_dense <tensorflow.python.keras.layers.core.Dense object at 0x7fd1d2e75e80>
2
{'outbound_layer': 'inn_dense', 'inbound_layers': 'inn_input', 'node_indices': 0, 'tensor_indices': 0}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2e92550>
{'outbound_layer': 'inn_dense', 'inbound_layers': 'nest_dense_1', 'node_indices': 0, 'tensor_indices': 0}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2ac4358>
===================
************************************************
You will notice immediately that after the nested model is built, one extra (inbound)node (or an arrow) is created, pointing to inn_dense. One was created, pointing from inn_input to inn_dense, another was created, pointing from nest_dense_1 to inn_dense. This is what it was said earlier, each time a layer is called, a new node (an arrow) is created.
5. Question answered
So far, I think it already explains the original question: why is 1 in [1][0]. It is because reusing the the inner_model causes the inner_dense layer to be used to create a Tensor for a second time.
The rest of the code snippet has bit extra information, you can check it out and get a better idea under the hood.
It seems it is now "_nodes_by_depth" instead of "nodes_by_depth". Same for inbound_nodes etc. Maybe the answer has to be updated..

PyCaffe NetSpec, how to create layers with the same name but different phases? [duplicate]

I want to set up a caffe CNN with python, using caffe.NetSpec() interface. Although I saw we can put test net in solver.prototxt, I would like to write it in model.prototxt with different phase. For example, caffe model prototxt implement two data layer with different phases:
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
....
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
....
}
How should I do in python to get such implementation?
I assume you mean how to define phase when writing a prototxt using caffe.NetSpec?
from caffe import layers as L, params as P, to_proto
import caffe
ns = caffe.NetSpec()
ns.data = L.Data(name="data",
data_param={'source':'/path/to/lmdb','batch_size':32},
include={'phase':caffe.TEST})
If you want to have BOTH train and test layers in the same prototxt, what I usually do is making one ns for train with ALL layers and another ns_test with only the test version of the duplicate layers only. Then, when writing the actual prototxt file:
with open('model.prototxt', 'w') as W:
W.write('%s\n' % ns_test.to_proto())
W.write('%s\n' % ns.to_proto())
This way you'll have BOTH phases in the same prototxt. A bit hacky, I know.
I find an useful method.
You can add a key named name for your test phase layer, and modify the keys ntop and top
just like this:
net.data = L.Data(name='data',
include=dict(phase=caffe_pb2.Phase.Value('TRAIN')),
ntop=1)
net.test_data = L.Data(name='data',
include=dict(phase=caffe_pb2.Phase.Value('TEST')),
top='data',
ntop=0)
If your network is like:
layer {phase: TRAIN}
layer {phase: TEST}
layer {}
layer {phase: TRAIN}
layer {}
layer {phase: TEST}
layer {}
layer {}
layer {phase: TEST}
Create a train net ns,
Create a test net ns_test
Now you basically have two strings str(ns.to_proto()) and str(ns_test.to_proto())
Merge those two using python regex taking into account the required layer order.
I found another way.
I could solve this problem returning the proto string.
Basically, you can add strings with the layers that are going to be replaced (in my case, the first layer).
def lenet(path_to_lmdb_train, path_to_lmdb_test,
batch_size_train, batch_size_test ):
n = caffe.NetSpec()
n.data, n.label = L.Data(batch_size=batch_size_train, backend=P.Data.LMDB, source=path_to_lmdb_train,
include=dict(phase=caffe.TRAIN), transform_param=dict(scale=1./255), ntop=2)
first_layer = str(n.to_proto())
n.data, n.label = L.Data(batch_size=batch_size_test, backend=P.Data.LMDB, source=path_to_lmdb_test,
include=dict(phase=caffe.TEST), transform_param=dict(scale=1./255), ntop=2)
n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'))
n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX)
n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))
n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX)
n.ip1 = L.InnerProduct(n.pool2, num_output=500, weight_filler=dict(type='xavier'))
n.relu1 = L.ReLU(n.ip1, in_place=True)
n.ip2 = L.InnerProduct(n.relu1, num_output=10, weight_filler=dict(type='xavier'))
n.loss = L.SoftmaxWithLoss( n.ip2, n.label )
n.accuracy = L.Accuracy( n.ip2, n.label, include=dict(phase=caffe.TEST) )
return first_layer + str(n.to_proto())
Although several answers have been given, none covers a more real-world scenario where you don't even know (at the moment of writing code) the names of your layers. For example when you're assembling a network from smaller blocks, you can't write:
n.data = L.Data(#...
n.test_data = L.Data(#...
Because every next instantiation of the block would overwrite data and test_data (or batchnorm, which is more likely to be put in blocks).
Fortunately, you can assign to the NetSpec object via __getitem__, like so:
layer_name = 'norm{}'.format(i) #for example
n[layer_name + '_train'] = L.Data(#...
n[layer_name + '_test'] = L.Data(#...