Custom Translator BLEU score for baseline model - microsoft-translator

I uploaded training/tuning/test sets and successfully trained a Custom Translator. A BLEU score of 28.5 is displayed in a red box in the Model tab of the web UI. I can download the translated test set along with the source and reference data from the Test tab in the model training details.
What I would like to know is whether this BLEU score is better or worse than the BLEU score that I could achieve with the non-customized Microsoft Translator? And by how much? Does the red color mean it is worse?
I know that I can call the Microsoft Translator API to get the non-customized translations for the test set and then run a BLEU scorer to get the difference, but I was hoping to see this in the web UI.
Thanks in advance!

Related

how to perform ocr on a selected part of the image

I'm a student working on a computer vision project for the automation of the underwriting process.
Following is an example of an old National Identity card of a Sri Lankan. I want to extract the identity card number from the image. When I perform OCR using pytesseract it could not extract the number properly. but when I prop just the number and perform OCR, it successfully identifies the number. This is not just a single use case. This should be automated, when a user enters his Identification number, my programme should be able to extract the number from the identification document and cross check it with what the user has enterd. Im stuck at the point where the extraction happens.
I got a suggestion to create and train a custom object detection model with yolo, and using that identify the part where NIC no is located, crop it and then perform OCR. I wonder whether there is a easier way of doing that other than this ? I just need to extract the identification number from the image.
a sample image of an old NIC is provided for your reference sample nic
"I got a suggestion to create and train a custom object detection model with yolo"
Using Yolo for detecting where the ID number is an overkill.
Some suggestions you can try.
Detect face using the opencv face-detector.
Detect the thick orange line using opencv threshold function on one of the color channels.
Then crop the area above the detected face/line for the id number.

Am I understanding training a model to use in a similar task (both translations but from different language pairs) later, wrong?

I am currently training an mt5 in Spanish to English (and vice versa) translation. That works with a bleu of 40ish. (both ways). I then want to use that same model I trained to improve the translation of Spanish to a low-resource language.
The way I'm doing it, I go through the whole translation process and then evaluate. After that I reload the model (since I'm doing this on Google Colab and by then it has nearly timed out so I need to restart the runtime) by doing:
model = AutoModel.from_pretrained("/content/drive/checkpoint-11076-epoch-1")
Then I try to train with the new data by doing:
model.train_model(train_df, eval_data=eval_df)
(with new train and eval data frames for the new languages), however, it won't train them for long, maybe a few seconds.
I know this can be done because I've read about people training in a language pair and then another, but maybe I'm missing something crucial?

For Image Caption problem in computer vision, what will happen if the trained model encouter some object that never occur in the dataset?

I am just starting off focusing on the research about image caption, which is a sub-domain of CV.As we all know, like other deep learning traing process, you've got to train the model based on training set and once the model is ready, you can use it. So I have a question in the image caption problem, what will happen if the trained model encouter some object that never occur in the dataset? Thanks for you replies!
It's likely 1 out of 2 things will happen:
The object is classified as something within the dataset categories and the text generator will pick up this class to build a sentence.
The object is not recognized and the rest of the frame is used to generate a sentence.
It depends how closely related the actual class is to anything that's built into the dataset. The first option is still more likely if the object dominates the area within the frame. You could set a manual threshold that discards class information below a certain confidence.

Camera image recognition with small sample set

I need to visually recognise some flat pictures showed to camera. There are not many of them (maybe 30) but discrimination may depend on details. The input may be partly obscured or shadowed and is suspect to lighting changes.
The samples need to be updatable.
There are many existing frameworks for object detection, with the most reliable ones depending on deep learning methods (mostly convolutional networks). However, the pretrained models are not well optimised to discern flat imagery of course, and even if I start training from scratch, updating the system for new samples would take a cumbersome training process, if I am right about how this works.
Is it possible to use deep learning while still keeping the sample pool flexible?
Is there any other well known reliable method to detect images from a small sample set?
One can use well trained networks for visual classification like Inception or SqueezeNet, slice of the last layer(s) and add a simple statistical algorithm (for example k-nearest neighbour) that can be directly teached by the samples in a non-iterative fashion.
Most classification-related calculations like lighting and orientation insensitivity are already handled by the pre-trained network then, while the network's output keep enough information to allow statistical algorithms decide the image class.
An implementation using k-nearest neighbour is shown here: https://teachablemachine.withgoogle.com/ , the source is hosted here: https://github.com/googlecreativelab/teachable-machine .
Use transfer learning; you’ll still need to build a training set, but you’ll get better results than starting with random weights. Try to find a model trained on images similar to yours. You might also do some black box testing of the selected model with your curated images to baseline it’s response curve to your images.

Caffe training uses face crop but deploy uses full image

I'm implementing this project and it is working fine. Now I wonder how is it possible that the training phase uses only a face crop of the image, but actual use can accept a full image with multiple people.
The model is trained to find a face within an image.
Training with face crops allows the training to converge faster, as it does not go through the trial-and-error to recognize -- and then learn to ignore -- other structures in the input images. The full capacity of the model topology can go toward facial features.
When you get to scoring ("actual use", a.k.a. inference), the model has no training for or against all the other stuff in each photo. It's trained to find faces, and will do that well.
Does that explain it well enough?