Include file in caffe .prototxt - caffe

I am building a siamese network from the example in BLVC's site
there they use a simple convolutonal net to generate the features for the contrastive loss function, this is done by copy and pasting the .prototxt of each of the networks in the .prototxt of the final siamese network, the problem is I am using a much larger network, the .prototxt having about 5700 lines.
Is there a directive that allows me to tell it to just "include" that file in runtime? Something in the lines of "input" in LATEX so I don't have a 12k+ lines file.

Related

How to write a configuration file to tell the AllenNLP trainer to randomly split dataset into train and dev

The official document of AllenNLP suggests specifying "validation_data_path" in the configuration file, but what if one wants to construct a dataset from a single source and then randomly split it into train and validation datasets with a given ratio?
Does AllenNLP support this? I would greatly appreciate your comments.
AllenNLP does not have this functionality yet, but we are working on some stuff to get there.
In the meantime, here is how I did it for the VQAv2 reader: https://github.com/allenai/allennlp-models/blob/main/allennlp_models/vision/dataset_readers/vqav2.py#L354
This reader supports Python slicing syntax where you, for example, specify a data_path as "my_source_file[:1000]" to take the first 1000 instances from my_source_file. You can also supply multiple paths by setting data_path: ["file1", "file2[:1000]", "file3[1000-"]]. You can probably steal the top two blocks in that file (line 354 to 369) and put them into your own dataset reader to achieve the same result.

Is there a convenient method to only save model architecture information in Pytorch to a protobuf ruled file?

Is there a convenient method to only save model architecture information in Pytorch to a protobuf ruled file?
I know how to use pytorch.save to save both weights and net at the same time, to a dictionary structured data. But if I'd like to save the data to an isolated file which only contain the net architecture, like what Caffe did train from initial status, is that possible? The file may used in somewhere else. Does ONNX can do something sort of like that?

Is it possible to include one Caffe's prototxt file in another?

Caffe requires at least three .prototxt files: for training, for deployment and to define solver parameters.
My training and deployment files contain identical pieces, describing network architecture. Is it possible to refactor this, by moving this common part out of them into a separate file?
You are looking for "all-in-one" network.
See this github discussion for more information.
Apparently, you can achieve this by using not only include {phase: XXX}, but also take advantage of stage and state.

Do CNNs (Convolution Neural Networks) require a CSV file?

I am trying to do some image classification using TensorFlow, and I'm using a CNN. I have a CSV file for the images, but I was wondering if I need a CSV file when I load the dataset (images), or will the CNN do the classification by itself without one. I'm pretty new to Machine Learning and TensorFlow, so some details would be helpful.
Not really sure why/what you are asking, but I think the answer to your question should be: no, you do not require a CVS (did you mean CSV?) file. If you write a program that loads the data with the labels you should be fine!

How do I train tesseract 4 with image data instead of a font file?

I'm trying to train Tesseract 4 with images instead of fonts.
In the docs they are explaining only the approach with fonts, not with images.
I know how it works, when I use a prior version of Tesseract but I didn't get how to use the box/tiff files to train with LSTM in Tesseract 4.
I looked into tesstrain.sh, which is used to generate LSTM training data but couldn't find anything helpful. Any ideas?
Clone the tesstrain repo at https://github.com/tesseract-ocr/tesstrain.
You’ll also need to clone the tessdata_best repo, https://github.com/tesseract-ocr/tessdata_best. This acts as the starting point for your training. It takes hundreds of thousands of samples of training data to get accuracy, so using a good starting point lets you fine-tune your training with much less data (~tens to hundreds of samples can be enough)
Add your training samples to the directory in the tesstrain repo named ./tesstrain/data/my-custom-model-ground-truth
Your training samples should be image/text file pairs that share the same name but different extensions. For example, you should have an image file named 001.png that is a picture of the text foobar and you should have a text file named 001.gt.txt that has the text foobar.
These files need to be single lines of text.
In the tesstrain repo, run this command:
make training MODEL_NAME=my-custom-model START_MODEL=eng TESSDATA=~/src/tessdata_best
Once the training is complete, there will be a new file tesstrain/data/.traineddata. Copy that file to the directory Tesseract searches for models. On my machine, it was /usr/local/share/tessdata/.
Then, you can run tesseract and use that model as a language.
tesseract -l my-custom-model foo.png -