Rapidminer for finding statistics - rapidminer

I am a beginner in Rapidminer .
While working on Rapidminer
i didn't find how to calculate or get skewness and kurtosis of a attributes in an exampleset.
I would like to know is there a way to get best fit line for a scatter plot graph

For skewness and kurtosis, you could either calculate it yourself using a combination of operators or you could use the R extension and use the moments package which contains these functions. I think I would use R.
For the scatter plot question, you could use the Linear Regression operator to build a model that fits a straight line. You need to arrange for one of the attributes to be a label that is to be predicted by the other attribute.

Related

Reconstruction of shape from elliptic Fourier descriptors

I have e extracted the elliptic Fourier descriptors for each otolith; but couldn't figure out how to normalize them with respect to the first harmonic and how to reconstruct mean shapes from them for each stations. I try myself, but couldn't get any results using Momocs pacage. Need expert helps in R script. Data in excel file
to use "first harmonic" normalization, just pass efourier() with default parameters (ie with norm=TRUE).
Have a look to Details section in ?efourier since this is usually not the best way to go (and I think it's very valid for otoliths)
feel free to contact me directly !
all the best

how to predict query topics using word-topic matrix?

I'm implementing LDA using Java. I know how the algorithm works. In the end of the training (the given iterations) I will get 2 matrices (topic-word and document-topic) that represent the set of the input documents.
My problem is that when I input a new document (query) I want to use these matrices (or any other way) to get the document-topic vector of that query. How would I do that?
Are you using Variational Inference or Gibbs Sampling?
For Gibbs Sampling a typical approach is adding the new document/s to the inference, and only updating its own counters, keeping constant the counters for the documents you used to learn the model.
This is specified in equations 84 and 85 in Parameter Estimation for Text Analysis
I guess there has to be a similar approach in VI LDA.

Is it possible to plot complex variable in wxMaxima or Octave?

For example , if I want to plot Sin(z) where z is a complex variable , how I will achieve it in either Octave or Maxima?
I don't know about Octave, but here is a message about that, with some code you can try in Maxima: https://www.ma.utexas.edu/pipermail/maxima/2007/006644.html
There may be more specific information for wxMaxima -- you can try their user forum: https://sourceforge.net/p/wxmaxima/discussion/435775/
(referring Octave 4.0.0)
How do you want to try to represent the output of the function? Plotting either the real or imaginary parts of the output can be done fairly simply using a 3-dimensional graph, where the x and y axes are the real and imaginary components of z, and the vertical axis is either the real or imaginary values of sin(z). Producing those are fairly simple in Octave. Here's a link to a script you can save and run to show an example.
Simply change the g = exp(f) line to g = sin(f).
Octave-help mailing list example
Note that the imaginary part plot is commented out. Just switch the # between the different plot commands if you want to see that part.
Now, are you instead looking for options to map the Z plane (z=x+iy) to the W plane (w=u+iv) and represent closed contours mapped by w=sin(z)? in that case you'll need to do parametric plotting as described on this FIT site. There is a link to his Matlab program at the bottom of the explanation that provides one method of using color coding to match z->w plane contour mapping.
Those m-files are written for Matlab, so a few things do not work, but the basic plotting is compatible with Octave 4.0.0. (the top level ss13.m file will fail on calls to flops and imwrite)
But, if you put your desired function in myfun13.m for f, df and d2f, (sin(z), cos(z), -sin(z) respectively), then run cvplot13, you'll get color maps showing the correspondence between z and w planes.
wxMaxima has a plot3d that can do it. Since the expression to plot is in terms of x and y, I plotted the function's magnitude with abs(f(x+%i*y)):
plot3d(abs((x+%i*y-3)*(x+%i*y-5)*(x+%i*y-6)), [x,2,7], [y,-1,1], [grid,100,100], [z,0,5])$

Using the Z-transform properties with the inverse z transform by MATLAB

I am using the Z-transform properties with the inverse Z-transform by MATLAB.
I cannot find a function that would apply the Z-tranform properties to convert the result by residuez to the time domain. Here is the image of question,
Edit
I can't believe such a powerful math language does not have functions to convert using the Z-transform table and Laplace transform tables.
You will find all you need in Partial Fraction Expansion (PDF document).
EDIT:
One possible source for a Table of Laplace and Z Transforms.

"Reverse" statistics: generating data based on mean and standard deviation

Having a dataset and calculating statistics from it is easy. How about the other way around?
Let's say I know some variable has an average X, standard deviation Y and assume it has normal (Gaussian) distribution. What would be the best way to generate a "random" dataset (of arbitrary size) which will fit the distribution?
EDIT: This kind of develops from this question; I could make something based on that method, but I am wondering if there's a more efficient way to do it.
You can generate standard normal random variables with the Box-Mueller method. Then to transform that to have mean mu and standard deviation sigma, multiply your samples by sigma and add mu. I.e. for each z from the standard normal, return mu + sigma*z.
This is really easy to do in Excel with the norminv() function. Example:
=norminv(rand(), 100, 15)
would generate a value from a normal distribution with mean of 100 and stdev of 15 (human IQs). Drag this formula down a column and you have as many values as you want.
I found a page where this problem is solved in several programming languages:
http://rosettacode.org/wiki/Random_numbers
There are several methods to generate Gaussian random variables. The standard method is Box-Meuller which was mentioned earlier. A slightly faster version is here:
http://en.wikipedia.org/wiki/Ziggurat_algorithm
Here's the wikipedia reference on generating Gaussian variables
http://en.wikipedia.org/wiki/Normal_distribution#Generating_values_from_normal_distribution
I'll give an example using R and the 2nd algorithm in the list here.
X<-4; Y<-2 # mean and std
z <- sapply(rep(0,100000), function(x) (sum(runif(12)) - 6) * Y + X)
plot(density(z))
> mean(z)
[1] 4.002347
> sd(z)
[1] 2.005114
> library(fUtilities)
> skewness(z,method ="moment")
[1] -0.003924771
attr(,"method")
[1] "moment"
> kurtosis(z,method ="moment")
[1] 2.882696
attr(,"method")
[1] "moment"
You could make it a kind of Monte Carlo simulation. Start with a wide random "acceptable range" and generate a few truly random values. Check your statistics and see if the average and variance are off. Adjust the "acceptable range" for the random values and add a few more values. Repeat until you have hit both your requirements and your population sample size.
Just off the top of my head, let me know what you think. :-)
The MATLAB function normrnd from the Statistics Toolbox can generate normally distributed random numbers with a given mu and sigma.
It is easy to generate dataset with normal distribution (see http://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform ).
Remember that generated sample will not have exact N(0,1) distribution! You need to standarize it - substract mean and then divide by std deviation. Then You are free to transform this sample to Normal distribution with given parameters: multiply by std deviation and then add mean.
Interestingly numpy has a prebuilt function for that:
import numpy as np
def generate_dataset(mean, std, samples):
dataset = np.random.normal(mean, std, samples)
return dataset