This script defining a dummy model using a small nested model
from keras.layers import Input, Dense
from keras.models import Model
import keras
input_inner = Input(shape=(4,), name='input_inner')
output_inner = Dense(3, name='inner_dense')(input_inner)
inner_model = Model(inputs=input_inner, outputs=output_inner)
input = Input(shape=(5,), name='input')
x = Dense(4, name='dense_1')(input)
x = inner_model(x)
x = Dense(2, name='dense_2')(x)
output = keras.layers.concatenate([x, x], name='concat_1')
model = Model(inputs=input, outputs=output)
print(model.summary())
yields the following output
Layer (type) Output Shape Param # Connected to
====================================================================================================
input (InputLayer) (None, 5) 0
____________________________________________________________________________________________________
dense_1 (Dense) (None, 4) 24 input[0][0]
____________________________________________________________________________________________________
model_1 (Model) (None, 3) 15 dense_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense) (None, 2) 8 model_1[1][0]
____________________________________________________________________________________________________
concat_1 (Concatenate) (None, 4) 0 dense_2[0][0]
dense_2[0][0]
My question concerns the content of the Connected to column.
I understand that a layer can have multiple nodes.
The notation of this column is layer_name[node_index][tensor_index].
If we regard inner_model as a layer I would expect it to have only one node, so I would expect dense_2 to be connected to model_1[0][0]. But in reality it is connected to model_1[1][0]. Why is this the case?
1.Background
When you say:
If we regard inner_model as a layer I would expect it to have only one
node
This is true in the sense that it has only one node which is part of the network.
Consider the github repository of the model.summary function. The function that prints the connections is print_layer_summary_with_connections (line 76), and it considers only the nodes from relevant_nodes array. All the nodes that are not in this array are considered not part of the network, and so the function skips them. The relevant lines are lines 88-90:
if relevant_nodes and node not in relevant_nodes:
# node is not part of the current network
continue
2.Your model
Now let's see what happens with your particular model. First let us define relevant_nodes:
relevant_nodes = []
for v in model.nodes_by_depth.values():
relevant_nodes += v
The array relevant_nodes looks like:
[<keras.engine.topology.Node at 0x9dfa518>,
<keras.engine.topology.Node at 0x9dfa278>,
<keras.engine.topology.Node at 0x9d8bac8>,
<keras.engine.topology.Node at 0x9d8ba58>,
<keras.engine.topology.Node at 0x9d74518>]
However, when we print the inbound nodes at every layer, we will get:
for i in model.layers:
print(i.inbound_nodes)
[<keras.engine.topology.Node object at 0x0000000009D74518>]
[<keras.engine.topology.Node object at 0x0000000009D8BA58>]
[<keras.engine.topology.Node object at 0x0000000009D743C8>, <keras.engine.topology.Node object at 0x0000000009D8BAC8>]
[<keras.engine.topology.Node object at 0x0000000009DFA278>]
[<keras.engine.topology.Node object at 0x0000000009DFA518>]
You can see that there is exactly one node in the list above that does not appear in relevant_nodes. This is the node in position 0 in the third array:
<keras.engine.topology.Node object at 0x0000000009D743C8>
It was not considered a part of the model, and hence did not appear in relevant_nodes. The node in position 1 in this array does appear in relevant_nodes, and this is why you see it as model_1[1][0].
3.The reason
The reason for that is basically the line x=inner_model(input). Even If you run much smaller model, as the one below:
input_inner = Input(shape=(4,), name='input_inner')
output_inner = Dense(3, name='inner_dense')(input_inner)
inner_model = Model(inputs=input_inner, outputs=output_inner)
input = Input(shape=(5,), name='input')
output = inner_model(input)
model = Model(inputs=input, outputs=output)
You will see that relevant_nodes contains two elements, while via
for i in model.layers:
print(i.inbound_nodes)
you'll get three nodes.
This is because layer 1 (of the smaller model above) has two nodes, but only the second one is considered part of the model. In particular, if you print the input at each one of the nodes at layer 1 with layer.get_input_at(node_index), you'll get:
print(model.layers[1].get_input_at(0))
print(model.layers[1].get_input_at(1))
#prints
/input_inner
/input
4.Answers to the questions in the comment
1) Do you also know what this non-relevant node is good for / where it
comes from?
This node seems to be an "internal node" created during the application of inner_model. In particular, if you print the input and output shape at each one of the three nodes (in the small model above), you get:
nodes=[model.layers[0].inbound_nodes[0],model.layers[1].inbound_nodes[0],model.layers[1].inbound_nodes[1]]
for i in nodes:
print(i.input_shapes)
print(i.output_shapes)
print(" ")
#prints
[(None, 5)]
[(None, 5)]
[(None, 4)]
[(None, 3)]
[(None, 5)]
[(None, 3)]
so you could see that the shapes of the middle node (the one that does not appear in the list of relevant nodes) correspond to the shapes in inner_model.
2) Will an inner model with n output nodes always present them with node
indices 1 to n instead of 0 to n-1?
I am not sure if always, as I guess there are various possibilities to have several output nodes nodes, but if I consider the following quite natural generalization of the small model above, this is indeed the case:
input_inner = Input(shape=(4,), name='input_inner')
output_inner = Dense(3, name='inner_dense')(input_inner)
inner_model = Model(inputs=input_inner, outputs=output_inner)
input = Input(shape=(5,), name='input')
output = inner_model(input)
output = inner_model(output)
model = Model(inputs=input, outputs=output)
print(model.summary())
Here I just added output = inner_model(output) to the small model. The list of relevant nodes is
[<keras.engine.topology.Node at 0xd10c390>,
<keras.engine.topology.Node at 0xd10c9b0>,
<keras.engine.topology.Node at 0xd10ca20>]
and the list of all inbound nodes is
[<keras.engine.topology.Node object at 0x000000000D10CA20>]
[<keras.engine.topology.Node object at 0x000000000D10C588>, <keras.engine.topology.Node object at 0x000000000D10C9B0>, <keras.engine.topology.Node object at 0x000000000D10C390>]
Indeed the node indices are 1 and 2, as you mentioned in the comment. It will continue similarly if I add another output = inner_model(output), with node indices being 1,2,3 and so on.
Updated on Sep, 2020. The selected answer was a bit outdated (the link does not point to the right place), and not exactly answered the question: model_1[1][0]. Why is 1 in in [1][0] in the this the case? Here's what I found.
The code I played with is as below (I added some names for layers for better reading). You can copy and run to see the output info.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
input_inner = layers.Input(shape=(4,), name='inn_input')
output_inner = layers.Dense(3, name='inn_dense')(input_inner)
inner_model = keras.Model(inputs=input_inner, outputs=output_inner,name='inn_model')
inn_allLayers = inner_model.layers
# print(type(inn_allLayers))
print(inner_model.name,': total layer number:',len(inn_allLayers))
for i in inn_allLayers:
print(i.name, i)
print(len(i._inbound_nodes))
for n in i._inbound_nodes:
print(n.get_config())
print(n)
print('===================')
print('************************************************')
nest_input = layers.Input(shape=(5,), name='nest_input')
nest_d1_out = layers.Dense(4, name='nest_dense_1')(nest_input)
nest_m_out = inner_model(nest_d1_out)
nest_d2_out = layers.Dense(2, name='nest_dense_2')(nest_m_out)
nest_add_out = layers.concatenate([nest_d2_out, nest_d2_out], name='nest_concat')
model = keras.Model(inputs=nest_input, outputs=nest_add_out,name='nest_model')
inn_allLayers = inner_model.layers
# print(type(inn_allLayers))
print(inner_model.name,': total layer number:',len(inn_allLayers))
for i in inn_allLayers:
print(i.name, i)
print(len(i._inbound_nodes))
for n in i._inbound_nodes:
print(n.get_config())
print(n)
print('===================')
print('************************************************')
allLayers = model.layers
# print(type(allLayers))
print(model.name,': total layer number:',len(allLayers))
for i in allLayers:
print(i.name, i)
print(len(i._inbound_nodes))
for n in i._inbound_nodes:
print(n.get_config())
print(n)
print('===================')
for op in tf.get_default_graph().get_operations():
print(str(op.name))
1. [1][0] represents [node_index][tensor_index]
2. what is node_index?
Under tensorflow/python/keras/engine/base_layer.py, it's described in this class:
class KerasHistory(
collections.namedtuple('KerasHistory',
['layer', 'node_index', 'tensor_index'])):
"""Tracks the Layer call that created a Tensor, for Keras Graph Networks.
During construction of Keras Graph Networks, this metadata is added to
each Tensor produced as the output of a Layer, starting with an
`InputLayer`. This allows Keras to track how each Tensor was produced, and
this information is later retraced by the `keras.engine.Network` class to
reconstruct the Keras Graph Network.
Attributes:
layer: The Layer that produced the Tensor.
node_index: The specific call to the Layer that produced this Tensor. Layers
can be called multiple times in order to share weights. A new node is
created every time a Tensor is called.
tensor_index: The output index for this Tensor. Always zero if the Layer
that produced this Tensor only has one output. Nested structures of
Tensors are deterministically assigned an index via `nest.flatten`.
"""
# Added to maintain memory and performance characteristics of `namedtuple`
# while subclassing.
It says a Node is created each time a Tensor is called. To me, it is a bit vague. My understanding is that when a layer is called, it produces a Tensor, and different ways involve calling this layer will create multiple nodes (will show some print results later.)
3. How to print each node?
Under the same py file, there is this snippet:
# Create node, add it to inbound nodes.
Node(
self,
inbound_layers=inbound_layers,
node_indices=node_indices,
tensor_indices=tensor_indices,
input_tensors=input_tensors,
output_tensors=output_tensors,
arguments=arguments)
# Update tensor history metadata.
# The metadata attribute consists of
# 1) a layer instance
# 2) a node index for the layer
# 3) a tensor index for the node.
# The allows layer reuse (multiple nodes per layer) and multi-output
# or multi-input layers (e.g. a layer can return multiple tensors,
# and each can be sent to a different layer).
for i, tensor in enumerate(nest.flatten(output_tensors)):
tensor._keras_history = KerasHistory(self,
len(self._inbound_nodes) - 1, i)
The self refers Layer object. the info is recoded in each tensor's _keras_history and self._inbound_nodes attribute. Hence we can print exactly the node by print(layer._inbound_nodes[index_of_node].get_config() I already typed the runnable code in the code at the beginning.
(What is inbound and outbound nodes? It's big confusing by first look, but if you imagine each node is an arrow pointing from one layer to another layer, it might be easier. The code description is below)
class Node(object):
"""A `Node` describes the connectivity between two layers.
Each time a layer is connected to some new input,
a node is added to `layer._inbound_nodes`.
Each time the output of a layer is used by another layer,
a node is added to `layer._outbound_nodes`.
Arguments:
outbound_layer: the layer that takes
`input_tensors` and turns them into `output_tensors`
(the node gets created when the `call`
method of the layer was called).
inbound_layers: a list of layers, the same length as `input_tensors`,
the layers from where `input_tensors` originate.
node_indices: a list of integers, the same length as `inbound_layers`.
`node_indices[i]` is the origin node of `input_tensors[i]`
(necessary since each inbound layer might have several nodes,
e.g. if the layer is being shared with a different data stream).
tensor_indices: a list of integers,
the same length as `inbound_layers`.
`tensor_indices[i]` is the index of `input_tensors[i]` within the
output of the inbound layer
(necessary since each inbound layer might
have multiple tensor outputs, with each one being
independently manipulable).
input_tensors: list of input tensors.
output_tensors: list of output tensors.
arguments: dictionary of keyword arguments that were passed to the
`call` method of the layer at the call that created the node.
`node_indices` and `tensor_indices` are basically fine-grained coordinates
describing the origin of the `input_tensors`.
A node from layer A to layer B is added to:
- A._outbound_nodes
- B._inbound_nodes
"""
4. Observe node creation.
You might notice there are two exactly same print blocks for inner_model in the code: one is before nested model is built, one is after.
The output is as below:
inn_model : total layer number: 2
inn_input <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7fd1c6755780>
1
{'outbound_layer': 'inn_input', 'inbound_layers': [], 'node_indices': [], 'tensor_indices': []}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2e75e10>
===================
inn_dense <tensorflow.python.keras.layers.core.Dense object at 0x7fd1d2e75e80>
1
{'outbound_layer': 'inn_dense', 'inbound_layers': 'inn_input', 'node_indices': 0, 'tensor_indices': 0}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2e92550>
===================
************************************************
inn_model : total layer number: 2
inn_input <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7fd1c6755780>
1
{'outbound_layer': 'inn_input', 'inbound_layers': [], 'node_indices': [], 'tensor_indices': []}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2e75e10>
===================
inn_dense <tensorflow.python.keras.layers.core.Dense object at 0x7fd1d2e75e80>
2
{'outbound_layer': 'inn_dense', 'inbound_layers': 'inn_input', 'node_indices': 0, 'tensor_indices': 0}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2e92550>
{'outbound_layer': 'inn_dense', 'inbound_layers': 'nest_dense_1', 'node_indices': 0, 'tensor_indices': 0}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2ac4358>
===================
************************************************
You will notice immediately that after the nested model is built, one extra (inbound)node (or an arrow) is created, pointing to inn_dense. One was created, pointing from inn_input to inn_dense, another was created, pointing from nest_dense_1 to inn_dense. This is what it was said earlier, each time a layer is called, a new node (an arrow) is created.
5. Question answered
So far, I think it already explains the original question: why is 1 in [1][0]. It is because reusing the the inner_model causes the inner_dense layer to be used to create a Tensor for a second time.
The rest of the code snippet has bit extra information, you can check it out and get a better idea under the hood.
It seems it is now "_nodes_by_depth" instead of "nodes_by_depth". Same for inbound_nodes etc. Maybe the answer has to be updated..
Related
I have used MobilenetV2 as architecture with "imagenet" weights to classify between 4 classes of Xrays. I have a very good accuracy so I have saved these weights (Bioiatriki_project.h5). I want to further use the weights for another classification task but without 4 classes this time. I want to classify two classes .So my code in this second part for creating the model is
def create_model(pretrained=True):
mobile_model = MobileNetV2(
weights='/content/drive/MyDrive/Bioiatriki_project.h5',
input_shape=input_img_size,
alpha=1,
include_top=False)
print("mobileNetV2 has {} layers".format(len(mobile_model.layers)))
if pretrained:
for layer in mobile_model.layers[:-50]:
layer.trainable=False
for layer in mobile_model.layers[-50:]:
layer.trainable=True
else:
for layer in mobile_model.layers:
layer.trainable = True
model = mobile_model.layers[-3].output
model = layers.GlobalAveragePooling2D()(model)
model = layers.Dense(num_classes, activation="softmax", kernel_initializer='uniform')(model)
model = Model(inputs=mobile_model.input, outputs=model)
return model
This throws me this error:
ValueError: Weight count mismatch for layer #103 (named Conv_1_bn in the current model, dense in the save file). Layer expects 4 weight(s). Received 2 saved weight(s)
So how can I fix this?
I downloaded a dataset for facial key point detection the image and the labels were in a CSV file I extracted it using pandas but I don't know how to convert it into a tensor and load it into a data loader for training.
dataframe = pd.read_csv("training_facial_keypoints.csv")
dataframe['Image'] = dataframe['Image'].apply(lambda i: np.fromstring(i, sep=' '))
dataframe= dataframe.dropna()
images_array = np.vstack(dataframe['Image'].values)/255.0
images_array = images_array.astype(np.float32)
images_array = images_array.reshape(-1, 96, 96, 1)
print(images_array.shape)
labels_array = dataframe[dataframe.columns[:-1]].values
labels_array = (labels_array-48)/48
labels_array = labels_array.astype(np.float32)
I have the images and labels in two arrays. How do I create a training set from this and use transforms.
Then load it using a dataloader.
Create a subclass of torch.utils.data.Dataset, fill it with your data.
You can pass desired torchvision.transforms to it and apply them to your data in __getitem__(self, index).
Than you can pass it to torch.utils.data.DataLoader which allows multi-threaded loading of data.
And PyTorch has an overwhelming documentation you should first refer to.
I am implementing Yoon Kim's CNN (https://arxiv.org/abs/1408.5882) for text classification in Julia, using Flux as the deep learning framework, with individual sentences as input datapoints. The model zoo (https://github.com/FluxML/model-zoo) has proven useful to an extent, but it does not have an NLP example with CNNs. I'd like to check if my input data format is the correct one.
There is no explicit implementation in Flux of a 1D Conv, so I'm using Conv found in https://github.com/FluxML/Flux.jl/blob/master/src/layers/conv.jl
Here is part of the docstring that explains the input data format:
Data should be stored in WHCN order (width, height, # channels, # batches).
In other words, a 100×100 RGB image would be a `100×100×3×1` array,
and a batch of 50 would be a `100×100×3×50` array.
My format is as follows:
1. width: since text in a sentence is 1D, the width is always 1
2. height: this is the maximum number of tokens allowable in a sentence
3. \# of channels: this is the embedding size
4. \# of batches: the number of sentences in each batch
Following the MNIST example in the model zoo, I have
function make_minibatch(X, Y, idxs)
X_batch = zeros(1, num_sentences, emb_dims, MAX_LEN)
function get_sentence_matrix(sentence)
embeddings = Vector{Array{Float64, 1}}()
for word in sentence
embedding = get_embedding(word)
push!(embeddings, embedding)
end
embeddings = hcat(embeddings...)
return embeddings
end
for i in 1:length(idxs)
X_batch[1, i, :, :] = get_sentence_matrix(X[idxs[i]])
end
Y_batch = [Flux.onehot(label+1, 1:2) for label in Y[idxs]]
return (X_batch, Y_batch)
end
where the X is an array of arrays of words and the get_embedding function returns an embedding as an array.
X_batch is then a Array{Float64,4}. Is this the correct approach?
Does anyone know how to get the alignments weights when translating in Opennmt-py? Usually the only output are the resulting sentences and I have tried to find a debugging flag or similar for the attention weights. So far, I have been unsuccessful.
I'm not sure if this is a new feature, since I did not come across this when looking for alignments a few months back, but onmt seems to have added a flag -report_align to output word alignments along with the translation.
https://opennmt.net/OpenNMT-py/FAQ.html#raw-alignments-from-averaging-transformer-attention-heads
Excerpt from opennnmt.net -
Currently, we support producing word alignment while translating for Transformer based models. Using -report_align when calling translate.py will output the inferred alignments in Pharaoh format. Those alignments are computed from an argmax on the average of the attention heads of the second to last decoder layer.
You can get the attention matrices. Note that it is not the same as alignment which is a term from statistical (not neural) machine translation.
There is a thread on github discussing it. Here is a snippet from the discussion. When you get the translations from the mode, the attentions are in the attn field.
import onmt
import onmt.io
import onmt.translate
import onmt.ModelConstructor
from collections import namedtuple
# Load the model.
Opt = namedtuple('Opt', ['model', 'data_type', 'reuse_copy_attn', "gpu"])
opt = Opt("PATH_TO_SAVED_MODEL", "text", False, 0)
fields, model, model_opt = onmt.ModelConstructor.load_test_model(
opt, {"reuse_copy_attn" : False})
# Test data
data = onmt.io.build_dataset(
fields, "text", "PATH_TO_DATA", None, use_filter_pred=False)
data_iter = onmt.io.OrderedIterator(
dataset=data, device=0,
batch_size=1, train=False, sort=False,
sort_within_batch=True, shuffle=False)
# Translator
translator = onmt.translate.Translator(
model, fields, beam_size=5, n_best=1,
global_scorer=None, cuda=True)
builder = onmt.translate.TranslationBuilder(
data, translator.fields, 1, False, None)
batch = next(data_iter)
batch_data = translator.translate_batch(batch, data)
translations = builder.from_batch(batch_data)
translations[0].attn # <--- here are the attentions
I want to set up a caffe CNN with python, using caffe.NetSpec() interface. Although I saw we can put test net in solver.prototxt, I would like to write it in model.prototxt with different phase. For example, caffe model prototxt implement two data layer with different phases:
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
....
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
....
}
How should I do in python to get such implementation?
I assume you mean how to define phase when writing a prototxt using caffe.NetSpec?
from caffe import layers as L, params as P, to_proto
import caffe
ns = caffe.NetSpec()
ns.data = L.Data(name="data",
data_param={'source':'/path/to/lmdb','batch_size':32},
include={'phase':caffe.TEST})
If you want to have BOTH train and test layers in the same prototxt, what I usually do is making one ns for train with ALL layers and another ns_test with only the test version of the duplicate layers only. Then, when writing the actual prototxt file:
with open('model.prototxt', 'w') as W:
W.write('%s\n' % ns_test.to_proto())
W.write('%s\n' % ns.to_proto())
This way you'll have BOTH phases in the same prototxt. A bit hacky, I know.
I find an useful method.
You can add a key named name for your test phase layer, and modify the keys ntop and top
just like this:
net.data = L.Data(name='data',
include=dict(phase=caffe_pb2.Phase.Value('TRAIN')),
ntop=1)
net.test_data = L.Data(name='data',
include=dict(phase=caffe_pb2.Phase.Value('TEST')),
top='data',
ntop=0)
If your network is like:
layer {phase: TRAIN}
layer {phase: TEST}
layer {}
layer {phase: TRAIN}
layer {}
layer {phase: TEST}
layer {}
layer {}
layer {phase: TEST}
Create a train net ns,
Create a test net ns_test
Now you basically have two strings str(ns.to_proto()) and str(ns_test.to_proto())
Merge those two using python regex taking into account the required layer order.
I found another way.
I could solve this problem returning the proto string.
Basically, you can add strings with the layers that are going to be replaced (in my case, the first layer).
def lenet(path_to_lmdb_train, path_to_lmdb_test,
batch_size_train, batch_size_test ):
n = caffe.NetSpec()
n.data, n.label = L.Data(batch_size=batch_size_train, backend=P.Data.LMDB, source=path_to_lmdb_train,
include=dict(phase=caffe.TRAIN), transform_param=dict(scale=1./255), ntop=2)
first_layer = str(n.to_proto())
n.data, n.label = L.Data(batch_size=batch_size_test, backend=P.Data.LMDB, source=path_to_lmdb_test,
include=dict(phase=caffe.TEST), transform_param=dict(scale=1./255), ntop=2)
n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'))
n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX)
n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))
n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX)
n.ip1 = L.InnerProduct(n.pool2, num_output=500, weight_filler=dict(type='xavier'))
n.relu1 = L.ReLU(n.ip1, in_place=True)
n.ip2 = L.InnerProduct(n.relu1, num_output=10, weight_filler=dict(type='xavier'))
n.loss = L.SoftmaxWithLoss( n.ip2, n.label )
n.accuracy = L.Accuracy( n.ip2, n.label, include=dict(phase=caffe.TEST) )
return first_layer + str(n.to_proto())
Although several answers have been given, none covers a more real-world scenario where you don't even know (at the moment of writing code) the names of your layers. For example when you're assembling a network from smaller blocks, you can't write:
n.data = L.Data(#...
n.test_data = L.Data(#...
Because every next instantiation of the block would overwrite data and test_data (or batchnorm, which is more likely to be put in blocks).
Fortunately, you can assign to the NetSpec object via __getitem__, like so:
layer_name = 'norm{}'.format(i) #for example
n[layer_name + '_train'] = L.Data(#...
n[layer_name + '_test'] = L.Data(#...