machine learning in rapidminer(select by weights operator) - rapidminer

How can I provide input to the wei port of select by weight operator manually using an excel sheet.I tried the below format
attribute weight
rlndc.opr.l_hub_centr 0.8829047139
frntl.inf.orb.r_hub_centr 0.8750850468
spp.mtr.ar.l_hub_centr 0.8646401198
frntl.md.r_hub_centr 0.8620028802
cdt.r_hub_centr 0.8334183679
frntl.inf.tr.l_hub_centr 0.8289782694
rlndc.opr.r_hub_centr 0.8274914408
I am getting the error"expected attribute but received example set"

The Select by Weights operator requires a weights object. You can create one of these using the operator Data to Weights which takes an example set and creates a weights object with the weights all set to 1. You can write this to a file by using Write Weights and you can then edit this file to have the weights you want (it's an XML file with a specific format hence the need to write it correctly). Finally, you can use Read Weights to read in the edited file and have the weights object you need. This can be connected to Select by Weights.

Related

How to export regression model results and label by "today" in Stata?

I run a series of regression models (almost everyday). I manually keep track of my results by labeling exported regression results with a date (year-month-day format). How can this be automated in Stata (with outreg2 to Word)? Here is a minimal working example:
* load data
use http://www.stata-press.com/data/r13/nlswork
* regression
reg ln_wage c.age c.wks_u i.race i.union
* export results in word document in a file appended by "today"/date
outreg2 using "C:\PATH\Results\model_1_2020_08_21.doc", word
Today's date is accessible as a c-class result. Then you just need to format that as desired.
Here is how to get a local macro automatically:
. local wanted : di %tdCY!_N!_D daily(c(current_date), "DMY")
. di "`wanted'"
2020_08_21
So, you apply that like this
. local filename "C:\whatever\model_1_`wanted'.doc"
with the understanding that (e.g.) the model number might vary too.

How to train a DNN classification model using arbitrary data features stored in mysql?

I had a term project that needs to use data stored in MySQL to train a classification model using Tensorflow or whatever else.
I've tried to use examples from https://github.com/tensorflow/docs/blob/master/site/en/r2/tutorials/keras/feature_columns.ipynb, and it took me a lot of time to process the data to a csv file and modify the python script. While I need to do a lot of experiments, is there may be much more simple tool for me to train and experiment on my MySQL dataset?
Maybe SQLFlow can meet your needs; I tried to build an SQLFlow script with the dataset you provided, she should be like this:
SELECT *
FROM Heart_Disease
TRAIN DNNClassifier /* a pre-defined TensorFlow estimator, tf.estimator.DNNClassifier */
WITH n_classes = 3, hidden_units = [10, 20] /* a parameter of the Estimator class constructor */
COLUMN Age, Sex, CP, FBS .. /* From the raw data, enter the columns that you think will help predict your heart rate. */
LABEL Target /* lable column */
INTO Heart_Disease.test_model; /* The trained model is saved to the specified data table */
It is also very easy to apply this model:
SELECT *
FROM Heart_Disease.predict
PREDICT Heart_Disease.predict_result.Target
USING Heart_Disease.test_model;
Heart_Disease.predict Target column is empty, The predicted Target is saved to the Heart_Disease.predict_result.Target table.
FYI:https://github.com/sql-machine-learning/sqlflow/blob/develop/doc/demo.md
This is my first answer. Hope I can help you.
What you I think can do, is get the dump of data from sql if it's not realtime and not getting updated and then use that dump for the rest,
or you can create a connection of mysql and then feed that connection into pandas read_sql function, to get the dataframe.
A way to do that
Also if you're new to tensorflow, you should try looking at the tensorflow's estimator API that shall do your work, Apart from that you may use tensorflow's keras wrapper that also eases the work of making a NN network.

How to load image from csv file in tensorflow

I have image save in 0.csv files.
The format is as picture below.
How can I read it to tensorflow?
Thanks!
You should use the Dataset input pipeline introduced in tensorflow 1.4:
https://www.tensorflow.org/programmers_guide/datasets#consuming_text_data
Here's the example from the developers guide (though you'll want to read through that guide, it's quite well written):
filenames = ["/var/data/file1.txt", "/var/data/file2.txt"]
dataset = tf.data.Dataset.from_tensor_slices(filenames)
# Use `Dataset.flat_map()` to transform each file as a separate nested dataset,
# and then concatenate their contents sequentially into a single "flat" dataset.
# * Skip the first line (header row).
# * Filter out lines beginning with "#" (comments).
dataset = dataset.flat_map(
lambda filename: (
tf.data.TextLineDataset(filename)
.skip(1)
.filter(lambda line: tf.not_equal(tf.substr(line, 0, 1), "#"))))
The Dataset preprocessing pipeline has a few nice advantages. Most of the functionality you'll need such as reading text records, shuffling, batching, etc. are reduced to one-liners. More importantly though, it forces you into writing your preprocessing pipeline in a good, modular, testable way. It takes a little bit to get used to the API, but it's time well spent.

Caffe: Print the softmax score

In the given example of MNIST in the Caffe installation.
For any given test image, how to get the softmax scores for each category and do some processing on them? Say compute the mean and variance of them.
I am newbie so a detail would help me a lot. I am able to train the model and use the testing feature to get the prediction but I am not sure which files are to be edited in order to get the above results.
You can use python interface
import caffe
net = caffe.Net('/path/to/deploy.prototxt', '/path/to/weights.caffemodel', caffe.TEST)
in_ = read_data(...) # this is up to you to read a sample and convert it to numpy array
out_ = net.forward(data=in_) # assuming your net expects "data" in blob
Now you have the output of your net in a dictionary out (keys are names of output blobs). You can run it in a loop on several examples etc.
I can try to answer your question. Assuming in your deploying net, the softmax layer is like below:
layer {
name: "prob"
type : "Softmax"
bottom: "fc6"
top: "prob"
}
In your python code that processes data, combining with the code #Shai provided, you can get the probability of each category by adding code based on #Shai's code:
predicted_prob = net.blobs['prob'].data
predicted_prob will be returned an array that contains the probabilities with all categories.
For example, if you only have two categories, predicted_prob[0][0] will be the probability that this testing data belongs to one category and predicted_prob[0][1] will be the probability of the other one.
PS:
If you don't want to write any additional python script, according to https://github.com/BVLC/caffe/tree/master/examples/mnist
it says this example will automatically do the testing every 500 iterations. "500" is defined in solver, such as https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet_solver.prototxt
So you need to trace back the caffe source code that processes the solver file. I guess it should be https://github.com/BVLC/caffe/blob/master/src/caffe/solver.cpp
I am not sure solver.cpp is the correct file you need to look at. But in this file, you can see it has functions of testing and calculation of some values. I hope it can give you some ideas if no one else can answer your question.

Test and Training Set are Not Compatible

I have seen various articles about the same issue, Tried a lot of solutions and nothing is working. Kindly advice.
I am getting an error in WEKA:
"Problem Evaluating Classifier: Test and Training Set are Not
Compatible".
I am using
J48 as my algorithm
This is my Test set:
Trainset:
https://www.dropbox.com/s/fm0n1vkwc4yj8yn/train.csv
Evalset:
https://www.dropbox.com/s/2j9jgxnoxr8xjdx/Eval.csv
(I am unable to copy and paste due to long code)
I have tried "Batch Filtering" in WEKA (for Traningset) but it still does not work.
EDIT: I have even converted my .csv to .arff but still the same
issue.
EDIT2: I have made sure the headers in both CSV's match. Even
then same issue. Please help!
Please advice.
A common error in converting ".csv" files to ".arff" with Weka is when values for nominal attributes appear in a different order or not at all from dataset to dataset.
Your evaluation ".arff" file probably looks like this (skipping irrelevant data):
#relation Eval
#attribute a321 {TRUE}
Your train ".arff" file probably looks like this (skipping irrelevant data):
#relation train
#attribute a321 {FALSE}
However, both should contain all possible values for that attribute, and in the same order:
#attribute a321 {TRUE, FALSE}
You can remedy this by post-processing your ".arff" files in a text editor and changing the header so that your nominal values appear in the same order (and quantity) from file to file.
How do I divide a dataset into training and test set?
You can use the RemovePercentage filter (package weka.filters.unsupervised.instance).
In the Explorer just do the following:
training set:
Load the full dataset
select the RemovePercentage filter in the preprocess panel
set the correct percentage for the split
apply the filter
save the generated data as a new file
test set:
Load the full dataset (or just use undo to revert the changes to the dataset)
select the RemovePercentage filter if not yet selected
set the invertSelection property to true
apply the filter
save the generated data as new file