Can I use Restricted Boltzmann Machine for multiple regression output - regression

My question - Can we apply RBM in predicting multiple output regression like predicting course scores for a student, price of several product that a customer buys?
So I was dealing with multiple output regression problem!
I came across a paper where they had predicted GPA Scores for students in different courses using Restricted Boltzmann Machine (RBM). However, the paper did note detail about the how RBM was implemented for the problem of multiple regression output!
Please note that since I was planning to develop a single model for the task...I landed up here, I do not want multiple models!

Related

Making smaller models from pre-existing redundant models

Sorry for the vague title.
I will just start with an example. Say that I have a pre-existing model that can classify dogs, cats and humans. However, all I need is a model that can classify between dogs and cats (Humans are not needed). The pre-existing model is heavy and redundant, so I want to make a smaller, faster model that can just do the job needed.
What approaches exist?
I thought of utilizing knowledge distillation (using the previous model as a teacher and the new model as the student) and training a whole new model.
First, prune the teacher model to have a smaller version to be used as a student in distillation. A simple regime such as magnitude-based pruning will suffice.
For distillation, as your output vectors will not match anymore (the student is 2 dimensional and teacher is 3 dimensional, you will have to take this into account and only calculate this distillation loss based on the overlapping dimensions. An alternative is layer-wise distillation in which the output vectors are irrelevant and the distillation loss is calculated based on the difference between intermediate layers of the teacher and student. In both cases, total loss may include a difference between student output and label, in addition to student output and teacher output.
It is possible for a simple task like this that just basic transfer learning would suffice after pruning - that is just to replace the 3d output vector with a 2d output vector and continue training.

Analyse data with degree of affection

Hello everyone! I'm a newbie studying Data Analysis.
If you'd like to see relationship how A,B,C affects outcome, you may use several models such as KNN, SVM, Logistics regression (as far as I know).
But all of them are kinda categorical, rather than degree of affection.
Let's say, I'd like to show how Fonts and Colors contribute the degree of attraction (as shown).
What models can I use?
Thousands thanks!
If your input is only categorical variables (and having a few values each), then there are finitely many potential samples. Therefore, the model will have finitely many inputs, and, therefore, only a few outputs. Just warning.
If you use, say, KNN or random forest, you can assign L2 norm as your accuracy metric. It will emphasize that 1 is closer to 2 than 5 (pls don't forget to normalize).

Is it possible to train the sentiment classification model with the labeled data and then use it to predict sentiment on data that is not labeled?

I want to do sentiment analysis using machine learning (text classification) approach. For example nltk Naive Bayes Classifier.
But the issue is that a small amount of my data is labeled. (For example, 100 articles are labeled positive or negative) and 500 articles are not labeled.
I was thinking that I train the classifier with labeled data and then try to predict sentiments of unlabeled data.
Is it possible?
I am a beginner in machine learning and don't know much about it.
I am using python 3.7.
Thank you in advance.
Is it possible to train the sentiment classification model with the labeled data and then use it to predict sentiment on data that is not labeled?
Yes. This is basically the definition of what supervised learning is.
I.e. you train on data that has labels, so that you can then put it into production on categorizing your data that does not have labels.
(Any book on supervised learning will have code examples.)
I wonder if your question might really be: can I use supervised learning to make a model, assign labels to another 500 articles, then do further machine learning on all 600 articles? Well the answer is still yes, but the quality will fall somewhere between these two extremes:
Assign random labels to the 500. Bad results.
Get a domain expert assign correct labels to those 500. Good results.
Your model could fall anywhere between those two extremes. It is useful to know where it is, so know if it is worth using the data. You can get an estimate of that by taking a sample, say 25 records, and have them also assigned by a domain expert. If all 25 match, there is a reasonable chance your other 475 records also have been given good labels. If e.g. only 10 of the 25 match, the model is much closer to the random end of the spectrum, and using the other 475 records is probably a bad idea.
("10", "25", etc. are arbitrary examples; choose based on the number of different labels, and your desired confidence in the results.)

Large number of training steps results in poor performance in transfer learning

I have a question. I have used transfer learning to retrain googlenet on my image classification problem. I have 80,000 images which belong to 14 categories. I set number of training steps equal to 200,000. I think the code provided by Tensorflow should have drop out implimented and it trains based on random shuffling of dataset and cross validation approach, and and I do not see any overfiting in training and classification curves, and I get high cross validation accuracy and high test accuracy but when I apply my model to new dataset then I get poor classification result. Anybodey know what is going on?Thanks!

Is It Efficient and Scalable for a Neural Network to Rely on Weights that Require Database Interaction?

I'm a high school senior interested in computer science and I have been programming for almost nine years now. I've recently become interested in machine learning and I have decided to implement a neural network. I haven't begun to code it yet and have been in the designing stage for a while now. The objective of the program is to analyze a student's paper, along with some other information, and then predict what grade the student will receive, much like PaperRater. However, I plan to make it far more personal than PaperRater.
The program has four inputs, one is the student's paper, the second is the student's id (i.e, primary key), third is the teacher's id, and finally the course id. I am implementing this on a website where registered, verified users alone can submit their papers for grading. The contents of the paper are going to be weighed in relation to the relationship between the teacher and student and in relation to the course difficulty. The network adapts to the teacher's grading habits for certain classes, the relationship between the teacher and student (e.g., if a teacher dislikes a student you might expect to see a drop in the student's grades), and the course-level (e.g., a teacher shouldn't grade a freshman's paper as harshly as a senior's paper).
However, this approach poses some considerable problems. There is an inherent limit imposed, where the numbers of students, teachers and courses prove to be too much and everything blows up! That's because there is no magic number which can account for every combination of student, teacher and course.
So, I've concluded that each teacher, student, and course must have an individual (albeit arbitrary) weight associated with them, not present in the Neural Network itself. The teacher's weight would describe her grading difficulty, and the student's weight would describe her ability as a writer. The weight of the course would describe the difficulty of the course. Of course, as more and more data is aggregated, the weights should adapt to become more accurate representations.
I realize that there is a relation between teachers and students, teachers and courses, and students and courses; therefore, I plan to make three respective hidden layers which sum the weights of its inputs and apply an activation function. How could I store the weights associated with each teacher, student and course, though?
I have considered storing it in their respective tables, but I don't know how well that would scale (or for that matter, if it would work). I also considered storing it in a file and calling it like that, but I'm sure that would be even worse than storing it in a database.
So the main question I have is: is it (objectively) efficient, in terms of space and computational complexity, and scalable, to store and manage separate, individual weights for each possible element of certain inputs in a SQL database outside of the neural network, if there are a finite (not necessarily small) amount of possible choices for such inputs, and still receive a reasonable output?
Regardless, I would like an explanation as to how come. I believe it would be just fine, but I can't justify it myself and so I'm asking for help. Thanks in advance!
(P.S.: If you realize any problems with my approach not covered in the scope of this question, or have general advice, please include it as an addendum to your answer or please message me).