Predict scores of a product - regression

I want to predict scores of a product , each product will have scores out of 10 and their will be 5 different scoring feature for a product like robustness, style, nuance, modern, quality and each of them can be between 0 to 10, i have tried xgbregressor with multi output regressor library of sklearn to deal with multiple labels and got a mean absolute error of 1.19 with standard deviation 0.036 after doing 10 fold cv, i am not sure whether this a good score or not and whether this a good way of predicting these scores.

Related

Best fit to model is a poor predictor

For fun, I was trying to make a predictor for how long it would take for George R. R. Martin's The Winds of Winter to be released. My "best" model is the one that had the lowest combined AIC and BIC score (summed together). I tried polynomials of degree 0 to something like 50. The best one of this sort was this, which had a degree of 3 or 4:
Where the y-axis is the days since A Game of Thrones was released and the x-axis is the number of books released in the A Song of Ice and Fire series. Despite this having the best combined AIC and BIC score, it is a poor predictor, predicting in a way that doesn't make any temporal sense. However, this had the best combined AIC and BIC score (the lowest one). This pointed out to me that my notion that "the best predictor is the one with the lowest combined AIC and BIC score" is flawed. Where have I gone wrong in my thinking, and what kind of scoring criteria would be more appropriate?

Is it possible to train the sentiment classification model with the labeled data and then use it to predict sentiment on data that is not labeled?

I want to do sentiment analysis using machine learning (text classification) approach. For example nltk Naive Bayes Classifier.
But the issue is that a small amount of my data is labeled. (For example, 100 articles are labeled positive or negative) and 500 articles are not labeled.
I was thinking that I train the classifier with labeled data and then try to predict sentiments of unlabeled data.
Is it possible?
I am a beginner in machine learning and don't know much about it.
I am using python 3.7.
Thank you in advance.
Is it possible to train the sentiment classification model with the labeled data and then use it to predict sentiment on data that is not labeled?
Yes. This is basically the definition of what supervised learning is.
I.e. you train on data that has labels, so that you can then put it into production on categorizing your data that does not have labels.
(Any book on supervised learning will have code examples.)
I wonder if your question might really be: can I use supervised learning to make a model, assign labels to another 500 articles, then do further machine learning on all 600 articles? Well the answer is still yes, but the quality will fall somewhere between these two extremes:
Assign random labels to the 500. Bad results.
Get a domain expert assign correct labels to those 500. Good results.
Your model could fall anywhere between those two extremes. It is useful to know where it is, so know if it is worth using the data. You can get an estimate of that by taking a sample, say 25 records, and have them also assigned by a domain expert. If all 25 match, there is a reasonable chance your other 475 records also have been given good labels. If e.g. only 10 of the 25 match, the model is much closer to the random end of the spectrum, and using the other 475 records is probably a bad idea.
("10", "25", etc. are arbitrary examples; choose based on the number of different labels, and your desired confidence in the results.)

Can I use Restricted Boltzmann Machine for multiple regression output

My question - Can we apply RBM in predicting multiple output regression like predicting course scores for a student, price of several product that a customer buys?
So I was dealing with multiple output regression problem!
I came across a paper where they had predicted GPA Scores for students in different courses using Restricted Boltzmann Machine (RBM). However, the paper did note detail about the how RBM was implemented for the problem of multiple regression output!
Please note that since I was planning to develop a single model for the task...I landed up here, I do not want multiple models!

What's the rule for training multiple levels of a game using DQNs?

I'm trying to create benchmarks for a variety of games that have 5 levels each. The goal is to train a model to convergence on 3 levels first, and then measure the learning curves on the remaining 2 levels.
Is there a general rule for how models should be trained on multiple levels? Should the training be done on one level after another?
Thanks very much for the help.
Suppose you are able to train for N levels in total (within the time constraints you may have).
I would not recommend the following setup:
Train N / 3 times on the first level
Train N / 3 times on the second level
Train N / 3 times on the second level
The risk with such a setup is that you first learn to play well on the first level, then forget everything you learned and "overfit" to the second level, and then forget again and overfit to the third level.
You'll want to make sure that you consistently keep a nice mix of levels throughout the entire training process, because the goal ultimately is to generalize and perform well on the (unseen) levels 4 and 5.
To do this, I'd recommend one of the following setups:
Train once on the first level
Train once on the second level
Train once on the third level
Repeat from step one again, untill you've trained the maximum N times
Alternatively:
Randomly select one of the first three levels to train.
Repeat until N times trained.
It may be possible to do even better with more sophisticated strategies. For example, you could try tracking your average performance per level over the last X times you've played a level, and prioritize levels in which you're not performing well yet (because apparantly you still have a lot to learn in those). This could, for instance, be done with a Multi-Armed Bandit strategy such as UCB1, where you use the negative recent performance as a "reward".
It may also be worth looking into the Learning track of the General Video Game AI competition (http://gvgai.net/). I believe that competition has precisely the setup you mentioned of three training levels plus two levels per game for evaluation (maybe this is even where your question came from?). You can have a look at what various participants in this competition are doing if their source code is available, and/or look up literature about the competition / competing entries.

Rapidminer decision tree using cross validation

I am using ten fold cross validation operator. I am using rapidminer first time so having some confusion that will I get 10 decision trees as a result. I have read that accuracy is average of all results so what is final output. Average of all?
The aim of cross validation is to output a prediction about the performance a model will produce when presented with unseen data.
For the 10 fold case, the data is split into 10 partitions. There are 10 possible ways to get 9/10 of the data to make training sets and these are used to build 10 models. These models are applied to the remaining 1 partition to produce a performance estimate. The 10 performances are averaged. The end result is an average that is a reasonable estimate of the performance of a model on unseen data.
The remaining question is what is the model? The best answer is to use a model built on all the data and to assume it is close enough to the 10 models used to generate the average estimate.