Improving BLEU score of custom model - microsoft-translator

I currently barely have 10,000 training sentences and I'd like to improve the BLEU score of the model. Is there a way to get more sentences, or get around the 10,000 minimum?
If not, how do I improve the score with the 10,000 sentences I have?

The minimum of 10k is required for training to complete. You could add any source of data in your desired language pair, but you really want to find training sentences that are relevant to what you will be translating.

Related

Is it possible to train the sentiment classification model with the labeled data and then use it to predict sentiment on data that is not labeled?

I want to do sentiment analysis using machine learning (text classification) approach. For example nltk Naive Bayes Classifier.
But the issue is that a small amount of my data is labeled. (For example, 100 articles are labeled positive or negative) and 500 articles are not labeled.
I was thinking that I train the classifier with labeled data and then try to predict sentiments of unlabeled data.
Is it possible?
I am a beginner in machine learning and don't know much about it.
I am using python 3.7.
Thank you in advance.
Is it possible to train the sentiment classification model with the labeled data and then use it to predict sentiment on data that is not labeled?
Yes. This is basically the definition of what supervised learning is.
I.e. you train on data that has labels, so that you can then put it into production on categorizing your data that does not have labels.
(Any book on supervised learning will have code examples.)
I wonder if your question might really be: can I use supervised learning to make a model, assign labels to another 500 articles, then do further machine learning on all 600 articles? Well the answer is still yes, but the quality will fall somewhere between these two extremes:
Assign random labels to the 500. Bad results.
Get a domain expert assign correct labels to those 500. Good results.
Your model could fall anywhere between those two extremes. It is useful to know where it is, so know if it is worth using the data. You can get an estimate of that by taking a sample, say 25 records, and have them also assigned by a domain expert. If all 25 match, there is a reasonable chance your other 475 records also have been given good labels. If e.g. only 10 of the 25 match, the model is much closer to the random end of the spectrum, and using the other 475 records is probably a bad idea.
("10", "25", etc. are arbitrary examples; choose based on the number of different labels, and your desired confidence in the results.)

Large number of training steps results in poor performance in transfer learning

I have a question. I have used transfer learning to retrain googlenet on my image classification problem. I have 80,000 images which belong to 14 categories. I set number of training steps equal to 200,000. I think the code provided by Tensorflow should have drop out implimented and it trains based on random shuffling of dataset and cross validation approach, and and I do not see any overfiting in training and classification curves, and I get high cross validation accuracy and high test accuracy but when I apply my model to new dataset then I get poor classification result. Anybodey know what is going on?Thanks!

How is unsupervised deep learning used in sentiment analysis?

Typically text classification, including sentiment analysis can be performed in one of 2 ways: 1. Supervised learning if there is enough training data and 2. A unsupervised training when there is no enough training data which is not prelabeled
I have only a collection of tweets which contains only the texte (reviews) and there is no polarity fir each twwet.
My question is is there any method to di sentimeent analysis on this data using unsupervised learning?
Thank you to help me
(Based on your comment, I've concentrated on the "unsupervised" part of your question, and ignored deep learning.)
If you use something like SentiWordNet you can assign a positive or negative score to each word in a tweet, and then (as the simplest approach) sum them to get a single sentiment number for each tweet.
At this point it doesn't really matter if you are doing supervised or unsupervised learning, as either way you will have a score for each tweet, and can divide them up the tweets into, say, positive, neutral and negative sentiment. What the supervised data, the class, does allow is getting an error estimate on how well it has done at classifying them.
If you want an error estimate when your training data has no classes, you could evaluate some percentage of the tweets yourself. Even just doing 30 of them will start to give you an idea of where your grouping algorithm is on the scale from random to perfect, and won't take long.

MALLET Ranking of Words in a topic

I am relatively new to mallet and need to know:
- are the words in each topic that mallet produces rank ordered in some way?
- if so, what is the ordering (i.e.) is 1st in a topic list the one with the highest distribution across the corpus?
Thanks!
they are ranked based on probabilities from the training, i.e. the first word is most probable to appear in this topic, the 2nd is less probable, the 3rd less and so on.. These are not directly related to term frequencies although surely the words with highest tfidf weights are more likely to be most probable. Also, Gibbs sampling has a lot to do with how words are ranked in topics - due to randomness in sampling you can get quite different probabilities for words within topics. Try, for example, to save the model and then retrain using --input-model option - the topics will look very much alike but not the same.
That said, if you need to see actual weights of terms in the corpus unrelated to LDA, you can use something like NLTK in Python to check frequency distributions and also something like sklearn for TFIDF to get more meaningful weight distributions.

Rapidminer decision tree using cross validation

I am using ten fold cross validation operator. I am using rapidminer first time so having some confusion that will I get 10 decision trees as a result. I have read that accuracy is average of all results so what is final output. Average of all?
The aim of cross validation is to output a prediction about the performance a model will produce when presented with unseen data.
For the 10 fold case, the data is split into 10 partitions. There are 10 possible ways to get 9/10 of the data to make training sets and these are used to build 10 models. These models are applied to the remaining 1 partition to produce a performance estimate. The 10 performances are averaged. The end result is an average that is a reasonable estimate of the performance of a model on unseen data.
The remaining question is what is the model? The best answer is to use a model built on all the data and to assume it is close enough to the 10 models used to generate the average estimate.