Input Text Data Formatting for CNN in Flux, in Julia - deep-learning

I am implementing Yoon Kim's CNN (https://arxiv.org/abs/1408.5882) for text classification in Julia, using Flux as the deep learning framework, with individual sentences as input datapoints. The model zoo (https://github.com/FluxML/model-zoo) has proven useful to an extent, but it does not have an NLP example with CNNs. I'd like to check if my input data format is the correct one.
There is no explicit implementation in Flux of a 1D Conv, so I'm using Conv found in https://github.com/FluxML/Flux.jl/blob/master/src/layers/conv.jl
Here is part of the docstring that explains the input data format:
Data should be stored in WHCN order (width, height, # channels, # batches).
In other words, a 100×100 RGB image would be a `100×100×3×1` array,
and a batch of 50 would be a `100×100×3×50` array.
My format is as follows:
1. width: since text in a sentence is 1D, the width is always 1
2. height: this is the maximum number of tokens allowable in a sentence
3. \# of channels: this is the embedding size
4. \# of batches: the number of sentences in each batch
Following the MNIST example in the model zoo, I have
function make_minibatch(X, Y, idxs)
X_batch = zeros(1, num_sentences, emb_dims, MAX_LEN)
function get_sentence_matrix(sentence)
embeddings = Vector{Array{Float64, 1}}()
for word in sentence
embedding = get_embedding(word)
push!(embeddings, embedding)
end
embeddings = hcat(embeddings...)
return embeddings
end
for i in 1:length(idxs)
X_batch[1, i, :, :] = get_sentence_matrix(X[idxs[i]])
end
Y_batch = [Flux.onehot(label+1, 1:2) for label in Y[idxs]]
return (X_batch, Y_batch)
end
where the X is an array of arrays of words and the get_embedding function returns an embedding as an array.
X_batch is then a Array{Float64,4}. Is this the correct approach?

Related

Two stages transfer learning

I have used MobilenetV2 as architecture with "imagenet" weights to classify between 4 classes of Xrays. I have a very good accuracy so I have saved these weights (Bioiatriki_project.h5). I want to further use the weights for another classification task but without 4 classes this time. I want to classify two classes .So my code in this second part for creating the model is
def create_model(pretrained=True):
mobile_model = MobileNetV2(
weights='/content/drive/MyDrive/Bioiatriki_project.h5',
input_shape=input_img_size,
alpha=1,
include_top=False)
print("mobileNetV2 has {} layers".format(len(mobile_model.layers)))
if pretrained:
for layer in mobile_model.layers[:-50]:
layer.trainable=False
for layer in mobile_model.layers[-50:]:
layer.trainable=True
else:
for layer in mobile_model.layers:
layer.trainable = True
model = mobile_model.layers[-3].output
model = layers.GlobalAveragePooling2D()(model)
model = layers.Dense(num_classes, activation="softmax", kernel_initializer='uniform')(model)
model = Model(inputs=mobile_model.input, outputs=model)
return model
This throws me this error:
ValueError: Weight count mismatch for layer #103 (named Conv_1_bn in the current model, dense in the save file). Layer expects 4 weight(s). Received 2 saved weight(s)
So how can I fix this?

Improving the recall of a Custom Named Entity Recognition (NER) in Spacy

This is a second part of another question I posted. However, they are different enough to be seperate questions, but could be related.
Previous question
Building a Custom Named Entity Recognition with Spacy , using random text as a sample
I have built a custom Named Entity Recognition (NER) using the method described in the previous question. From here, I just copied the method to build the NER from the Spacy website (under "Named Entity Recognizer" at this website https://spacy.io/usage/training#ner)
the custom NER works, sorta. If I sentence tokenize the text, lemmatize the words (so "strawberries" become "strawberry"), it can pick up an entity. However, it stops there. It sometimes picks up two entities, but very rarely.
Is there anything I can do to improve its accuracy?
Here is the code (I have TRAIN_DATA in this format, but for food items
TRAIN_DATA = [
("Uber blew through $1 million a week", {"entities": [(0, 4, "ORG")]}),
("Google rebrands its business apps", {"entities": [(0, 6, "ORG")]})]
)
The data is in the object train_food
import spacy
import nltk
nlp = spacy.blank("en")
#Create a built-in pipeline components and add them in the pipeline
if "ner" not in nlp.pipe_names:
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner, last =True)
else:
ner =nlp.get_pipe("ner")
##Testing for food
for _, annotations in train_food:
for ent in annotations.get("entities"):
ner.add_label(ent[2])
# get names of other pipes to disable them during training
pipe_exceptions = ["ner", "trf_wordpiecer", "trf_tok2vec"]
other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
model="en"
n_iter= 20
# only train NER
with nlp.disable_pipes(*other_pipes), warnings.catch_warnings():
# show warnings for misaligned entity spans once
warnings.filterwarnings("once", category=UserWarning, module='spacy')
# reset and initialize the weights randomly – but only if we're
# training a new model
nlp.begin_training()
for itn in range(n_iter):
random.shuffle(train_food)
losses = {}
# batch up the examples using spaCy's minibatch
batches = minibatch(train_food, size=compounding(4.0, 32.0, 1.001))
for batch in batches:
texts, annotations = zip(*batch)
nlp.update(
texts, # batch of texts
annotations, # batch of annotations
drop=0.5, # dropout - make it harder to memorise data
losses=losses,
)
print("Losses", losses)
text = "mike went to the supermarket today. he went and bought a potatoes, carrots, towels, garlic, soap, perfume, a fridge, a tomato, tomatoes and tuna."
After this, and using text as a sample, I ran this code
def text_processor(text):
text = text.lower()
token = nltk.word_tokenize(text)
ls = []
for x in token:
p = lemmatizer.lemmatize(x)
ls.append(f"{p}")
new_text = " ".join(map(str,ls))
return new_text
def ner (text):
new_text = text_processor(text)
tokenizer = nltk.PunktSentenceTokenizer()
sentences = tokenizer.tokenize(new_text)
for sentence in sentences:
doc = nlp(sentence)
for ent in doc.ents:
print(ent.text, ent.label_)
ner(text)
This results in
potato FOOD
carrot FOOD
Running the following code
ner("mike went to the supermarket today. he went and bought garlic and tuna")
Results in
garlic FOOD
Ideally, I want the NER to pick up potato, carrot and garlic. Is there anything I can do?
Thank you
Kah
while you are training your model, You can try some information retrieval techniques such as:
1-lower casing all of the words
2-replace words with their synonyms
3-removing stop words
4-rewrite sentences(it can be done automatically using back-translation aka translating into Arabic, then translating it back into English)
also, consider using better models such as:
http://nlp.stanford.edu:8080/corenlp
https://huggingface.co/models

How to extract images, labels from csv file and create a trainset using torch?

I downloaded a dataset for facial key point detection the image and the labels were in a CSV file I extracted it using pandas but I don't know how to convert it into a tensor and load it into a data loader for training.
dataframe = pd.read_csv("training_facial_keypoints.csv")
dataframe['Image'] = dataframe['Image'].apply(lambda i: np.fromstring(i, sep=' '))
dataframe= dataframe.dropna()
images_array = np.vstack(dataframe['Image'].values)/255.0
images_array = images_array.astype(np.float32)
images_array = images_array.reshape(-1, 96, 96, 1)
print(images_array.shape)
labels_array = dataframe[dataframe.columns[:-1]].values
labels_array = (labels_array-48)/48
labels_array = labels_array.astype(np.float32)
I have the images and labels in two arrays. How do I create a training set from this and use transforms.
Then load it using a dataloader.
Create a subclass of torch.utils.data.Dataset, fill it with your data.
You can pass desired torchvision.transforms to it and apply them to your data in __getitem__(self, index).
Than you can pass it to torch.utils.data.DataLoader which allows multi-threaded loading of data.
And PyTorch has an overwhelming documentation you should first refer to.

Getting alignment/attention during translation in OpenNMT-py

Does anyone know how to get the alignments weights when translating in Opennmt-py? Usually the only output are the resulting sentences and I have tried to find a debugging flag or similar for the attention weights. So far, I have been unsuccessful.
I'm not sure if this is a new feature, since I did not come across this when looking for alignments a few months back, but onmt seems to have added a flag -report_align to output word alignments along with the translation.
https://opennmt.net/OpenNMT-py/FAQ.html#raw-alignments-from-averaging-transformer-attention-heads
Excerpt from opennnmt.net -
Currently, we support producing word alignment while translating for Transformer based models. Using -report_align when calling translate.py will output the inferred alignments in Pharaoh format. Those alignments are computed from an argmax on the average of the attention heads of the second to last decoder layer.
You can get the attention matrices. Note that it is not the same as alignment which is a term from statistical (not neural) machine translation.
There is a thread on github discussing it. Here is a snippet from the discussion. When you get the translations from the mode, the attentions are in the attn field.
import onmt
import onmt.io
import onmt.translate
import onmt.ModelConstructor
from collections import namedtuple
# Load the model.
Opt = namedtuple('Opt', ['model', 'data_type', 'reuse_copy_attn', "gpu"])
opt = Opt("PATH_TO_SAVED_MODEL", "text", False, 0)
fields, model, model_opt = onmt.ModelConstructor.load_test_model(
opt, {"reuse_copy_attn" : False})
# Test data
data = onmt.io.build_dataset(
fields, "text", "PATH_TO_DATA", None, use_filter_pred=False)
data_iter = onmt.io.OrderedIterator(
dataset=data, device=0,
batch_size=1, train=False, sort=False,
sort_within_batch=True, shuffle=False)
# Translator
translator = onmt.translate.Translator(
model, fields, beam_size=5, n_best=1,
global_scorer=None, cuda=True)
builder = onmt.translate.TranslationBuilder(
data, translator.fields, 1, False, None)
batch = next(data_iter)
batch_data = translator.translate_batch(batch, data)
translations = builder.from_batch(batch_data)
translations[0].attn # <--- here are the attentions

Understanding Keras model architecture (node index of nested model)

This script defining a dummy model using a small nested model
from keras.layers import Input, Dense
from keras.models import Model
import keras
input_inner = Input(shape=(4,), name='input_inner')
output_inner = Dense(3, name='inner_dense')(input_inner)
inner_model = Model(inputs=input_inner, outputs=output_inner)
input = Input(shape=(5,), name='input')
x = Dense(4, name='dense_1')(input)
x = inner_model(x)
x = Dense(2, name='dense_2')(x)
output = keras.layers.concatenate([x, x], name='concat_1')
model = Model(inputs=input, outputs=output)
print(model.summary())
yields the following output
Layer (type) Output Shape Param # Connected to
====================================================================================================
input (InputLayer) (None, 5) 0
____________________________________________________________________________________________________
dense_1 (Dense) (None, 4) 24 input[0][0]
____________________________________________________________________________________________________
model_1 (Model) (None, 3) 15 dense_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense) (None, 2) 8 model_1[1][0]
____________________________________________________________________________________________________
concat_1 (Concatenate) (None, 4) 0 dense_2[0][0]
dense_2[0][0]
My question concerns the content of the Connected to column.
I understand that a layer can have multiple nodes.
The notation of this column is layer_name[node_index][tensor_index].
If we regard inner_model as a layer I would expect it to have only one node, so I would expect dense_2 to be connected to model_1[0][0]. But in reality it is connected to model_1[1][0]. Why is this the case?
1.Background
When you say:
If we regard inner_model as a layer I would expect it to have only one
node
This is true in the sense that it has only one node which is part of the network.
Consider the github repository of the model.summary function. The function that prints the connections is print_layer_summary_with_connections (line 76), and it considers only the nodes from relevant_nodes array. All the nodes that are not in this array are considered not part of the network, and so the function skips them. The relevant lines are lines 88-90:
if relevant_nodes and node not in relevant_nodes:
# node is not part of the current network
continue
2.Your model
Now let's see what happens with your particular model. First let us define relevant_nodes:
relevant_nodes = []
for v in model.nodes_by_depth.values():
relevant_nodes += v
The array relevant_nodes looks like:
[<keras.engine.topology.Node at 0x9dfa518>,
<keras.engine.topology.Node at 0x9dfa278>,
<keras.engine.topology.Node at 0x9d8bac8>,
<keras.engine.topology.Node at 0x9d8ba58>,
<keras.engine.topology.Node at 0x9d74518>]
However, when we print the inbound nodes at every layer, we will get:
for i in model.layers:
print(i.inbound_nodes)
[<keras.engine.topology.Node object at 0x0000000009D74518>]
[<keras.engine.topology.Node object at 0x0000000009D8BA58>]
[<keras.engine.topology.Node object at 0x0000000009D743C8>, <keras.engine.topology.Node object at 0x0000000009D8BAC8>]
[<keras.engine.topology.Node object at 0x0000000009DFA278>]
[<keras.engine.topology.Node object at 0x0000000009DFA518>]
You can see that there is exactly one node in the list above that does not appear in relevant_nodes. This is the node in position 0 in the third array:
<keras.engine.topology.Node object at 0x0000000009D743C8>
It was not considered a part of the model, and hence did not appear in relevant_nodes. The node in position 1 in this array does appear in relevant_nodes, and this is why you see it as model_1[1][0].
3.The reason
The reason for that is basically the line x=inner_model(input). Even If you run much smaller model, as the one below:
input_inner = Input(shape=(4,), name='input_inner')
output_inner = Dense(3, name='inner_dense')(input_inner)
inner_model = Model(inputs=input_inner, outputs=output_inner)
input = Input(shape=(5,), name='input')
output = inner_model(input)
model = Model(inputs=input, outputs=output)
You will see that relevant_nodes contains two elements, while via
for i in model.layers:
print(i.inbound_nodes)
you'll get three nodes.
This is because layer 1 (of the smaller model above) has two nodes, but only the second one is considered part of the model. In particular, if you print the input at each one of the nodes at layer 1 with layer.get_input_at(node_index), you'll get:
print(model.layers[1].get_input_at(0))
print(model.layers[1].get_input_at(1))
#prints
/input_inner
/input
4.Answers to the questions in the comment
1) Do you also know what this non-relevant node is good for / where it
comes from?
This node seems to be an "internal node" created during the application of inner_model. In particular, if you print the input and output shape at each one of the three nodes (in the small model above), you get:
nodes=[model.layers[0].inbound_nodes[0],model.layers[1].inbound_nodes[0],model.layers[1].inbound_nodes[1]]
for i in nodes:
print(i.input_shapes)
print(i.output_shapes)
print(" ")
#prints
[(None, 5)]
[(None, 5)]
[(None, 4)]
[(None, 3)]
[(None, 5)]
[(None, 3)]
so you could see that the shapes of the middle node (the one that does not appear in the list of relevant nodes) correspond to the shapes in inner_model.
2) Will an inner model with n output nodes always present them with node
indices 1 to n instead of 0 to n-1?
I am not sure if always, as I guess there are various possibilities to have several output nodes nodes, but if I consider the following quite natural generalization of the small model above, this is indeed the case:
input_inner = Input(shape=(4,), name='input_inner')
output_inner = Dense(3, name='inner_dense')(input_inner)
inner_model = Model(inputs=input_inner, outputs=output_inner)
input = Input(shape=(5,), name='input')
output = inner_model(input)
output = inner_model(output)
model = Model(inputs=input, outputs=output)
print(model.summary())
Here I just added output = inner_model(output) to the small model. The list of relevant nodes is
[<keras.engine.topology.Node at 0xd10c390>,
<keras.engine.topology.Node at 0xd10c9b0>,
<keras.engine.topology.Node at 0xd10ca20>]
and the list of all inbound nodes is
[<keras.engine.topology.Node object at 0x000000000D10CA20>]
[<keras.engine.topology.Node object at 0x000000000D10C588>, <keras.engine.topology.Node object at 0x000000000D10C9B0>, <keras.engine.topology.Node object at 0x000000000D10C390>]
Indeed the node indices are 1 and 2, as you mentioned in the comment. It will continue similarly if I add another output = inner_model(output), with node indices being 1,2,3 and so on.
Updated on Sep, 2020. The selected answer was a bit outdated (the link does not point to the right place), and not exactly answered the question: model_1[1][0]. Why is 1 in in [1][0] in the this the case? Here's what I found.
The code I played with is as below (I added some names for layers for better reading). You can copy and run to see the output info.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
input_inner = layers.Input(shape=(4,), name='inn_input')
output_inner = layers.Dense(3, name='inn_dense')(input_inner)
inner_model = keras.Model(inputs=input_inner, outputs=output_inner,name='inn_model')
inn_allLayers = inner_model.layers
# print(type(inn_allLayers))
print(inner_model.name,': total layer number:',len(inn_allLayers))
for i in inn_allLayers:
print(i.name, i)
print(len(i._inbound_nodes))
for n in i._inbound_nodes:
print(n.get_config())
print(n)
print('===================')
print('************************************************')
nest_input = layers.Input(shape=(5,), name='nest_input')
nest_d1_out = layers.Dense(4, name='nest_dense_1')(nest_input)
nest_m_out = inner_model(nest_d1_out)
nest_d2_out = layers.Dense(2, name='nest_dense_2')(nest_m_out)
nest_add_out = layers.concatenate([nest_d2_out, nest_d2_out], name='nest_concat')
model = keras.Model(inputs=nest_input, outputs=nest_add_out,name='nest_model')
inn_allLayers = inner_model.layers
# print(type(inn_allLayers))
print(inner_model.name,': total layer number:',len(inn_allLayers))
for i in inn_allLayers:
print(i.name, i)
print(len(i._inbound_nodes))
for n in i._inbound_nodes:
print(n.get_config())
print(n)
print('===================')
print('************************************************')
allLayers = model.layers
# print(type(allLayers))
print(model.name,': total layer number:',len(allLayers))
for i in allLayers:
print(i.name, i)
print(len(i._inbound_nodes))
for n in i._inbound_nodes:
print(n.get_config())
print(n)
print('===================')
for op in tf.get_default_graph().get_operations():
print(str(op.name))
1. [1][0] represents [node_index][tensor_index]
2. what is node_index?
Under tensorflow/python/keras/engine/base_layer.py, it's described in this class:
class KerasHistory(
collections.namedtuple('KerasHistory',
['layer', 'node_index', 'tensor_index'])):
"""Tracks the Layer call that created a Tensor, for Keras Graph Networks.
During construction of Keras Graph Networks, this metadata is added to
each Tensor produced as the output of a Layer, starting with an
`InputLayer`. This allows Keras to track how each Tensor was produced, and
this information is later retraced by the `keras.engine.Network` class to
reconstruct the Keras Graph Network.
Attributes:
layer: The Layer that produced the Tensor.
node_index: The specific call to the Layer that produced this Tensor. Layers
can be called multiple times in order to share weights. A new node is
created every time a Tensor is called.
tensor_index: The output index for this Tensor. Always zero if the Layer
that produced this Tensor only has one output. Nested structures of
Tensors are deterministically assigned an index via `nest.flatten`.
"""
# Added to maintain memory and performance characteristics of `namedtuple`
# while subclassing.
It says a Node is created each time a Tensor is called. To me, it is a bit vague. My understanding is that when a layer is called, it produces a Tensor, and different ways involve calling this layer will create multiple nodes (will show some print results later.)
3. How to print each node?
Under the same py file, there is this snippet:
# Create node, add it to inbound nodes.
Node(
self,
inbound_layers=inbound_layers,
node_indices=node_indices,
tensor_indices=tensor_indices,
input_tensors=input_tensors,
output_tensors=output_tensors,
arguments=arguments)
# Update tensor history metadata.
# The metadata attribute consists of
# 1) a layer instance
# 2) a node index for the layer
# 3) a tensor index for the node.
# The allows layer reuse (multiple nodes per layer) and multi-output
# or multi-input layers (e.g. a layer can return multiple tensors,
# and each can be sent to a different layer).
for i, tensor in enumerate(nest.flatten(output_tensors)):
tensor._keras_history = KerasHistory(self,
len(self._inbound_nodes) - 1, i)
The self refers Layer object. the info is recoded in each tensor's _keras_history and self._inbound_nodes attribute. Hence we can print exactly the node by print(layer._inbound_nodes[index_of_node].get_config() I already typed the runnable code in the code at the beginning.
(What is inbound and outbound nodes? It's big confusing by first look, but if you imagine each node is an arrow pointing from one layer to another layer, it might be easier. The code description is below)
class Node(object):
"""A `Node` describes the connectivity between two layers.
Each time a layer is connected to some new input,
a node is added to `layer._inbound_nodes`.
Each time the output of a layer is used by another layer,
a node is added to `layer._outbound_nodes`.
Arguments:
outbound_layer: the layer that takes
`input_tensors` and turns them into `output_tensors`
(the node gets created when the `call`
method of the layer was called).
inbound_layers: a list of layers, the same length as `input_tensors`,
the layers from where `input_tensors` originate.
node_indices: a list of integers, the same length as `inbound_layers`.
`node_indices[i]` is the origin node of `input_tensors[i]`
(necessary since each inbound layer might have several nodes,
e.g. if the layer is being shared with a different data stream).
tensor_indices: a list of integers,
the same length as `inbound_layers`.
`tensor_indices[i]` is the index of `input_tensors[i]` within the
output of the inbound layer
(necessary since each inbound layer might
have multiple tensor outputs, with each one being
independently manipulable).
input_tensors: list of input tensors.
output_tensors: list of output tensors.
arguments: dictionary of keyword arguments that were passed to the
`call` method of the layer at the call that created the node.
`node_indices` and `tensor_indices` are basically fine-grained coordinates
describing the origin of the `input_tensors`.
A node from layer A to layer B is added to:
- A._outbound_nodes
- B._inbound_nodes
"""
4. Observe node creation.
You might notice there are two exactly same print blocks for inner_model in the code: one is before nested model is built, one is after.
The output is as below:
inn_model : total layer number: 2
inn_input <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7fd1c6755780>
1
{'outbound_layer': 'inn_input', 'inbound_layers': [], 'node_indices': [], 'tensor_indices': []}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2e75e10>
===================
inn_dense <tensorflow.python.keras.layers.core.Dense object at 0x7fd1d2e75e80>
1
{'outbound_layer': 'inn_dense', 'inbound_layers': 'inn_input', 'node_indices': 0, 'tensor_indices': 0}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2e92550>
===================
************************************************
inn_model : total layer number: 2
inn_input <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7fd1c6755780>
1
{'outbound_layer': 'inn_input', 'inbound_layers': [], 'node_indices': [], 'tensor_indices': []}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2e75e10>
===================
inn_dense <tensorflow.python.keras.layers.core.Dense object at 0x7fd1d2e75e80>
2
{'outbound_layer': 'inn_dense', 'inbound_layers': 'inn_input', 'node_indices': 0, 'tensor_indices': 0}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2e92550>
{'outbound_layer': 'inn_dense', 'inbound_layers': 'nest_dense_1', 'node_indices': 0, 'tensor_indices': 0}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2ac4358>
===================
************************************************
You will notice immediately that after the nested model is built, one extra (inbound)node (or an arrow) is created, pointing to inn_dense. One was created, pointing from inn_input to inn_dense, another was created, pointing from nest_dense_1 to inn_dense. This is what it was said earlier, each time a layer is called, a new node (an arrow) is created.
5. Question answered
So far, I think it already explains the original question: why is 1 in [1][0]. It is because reusing the the inner_model causes the inner_dense layer to be used to create a Tensor for a second time.
The rest of the code snippet has bit extra information, you can check it out and get a better idea under the hood.
It seems it is now "_nodes_by_depth" instead of "nodes_by_depth". Same for inbound_nodes etc. Maybe the answer has to be updated..