Normalization method - phyloseq

I am a PhD student working on microbiome 16S rRNA sequence data. I have this code for phyloseq normalization from a previous student and wanted to ask if someone could help explain exactly what it is doing:
ps_norm <- transform_sample_counts(ps, function(x) x / sum(x) )
Thank you

You can find the documentation of transform_sample_counts here
This function transforms the sample counts of a taxa abundance matrix according to a user-provided function. The counts of each sample will be transformed individually. No sample-sample interaction/comparison is possible by this method.

Related

Need hint for the Exercise posed in the Tensorflow Convolution Neural Networks Tutorial

Below is the exercise question posed on this page https://www.tensorflow.org/versions/0.6.0/tutorials/deep_cnn/index.html
EXERCISE: The output of inference are un-normalized logits. Try
editing the network architecture to return normalized predictions
using tf.softmax().
In the spirit of the exercise, I want to know if I'm on the right-track (not looking for the coded-up answer).
Here's my proposed solution.
Step 1: The last layer (of the inference) in the example is a "softmax_linear", i.e., it simply does the unnormalized WX+b transformation. As stipulated, we apply the tf.nn.softmax operation with softmax_linear as input. This normalizes the output as probabilities on the range [0, 1].
Step 2: The next step is to modify the cross-entropy calculation in the loss-function. Since we already have normalized output, we need to replace the tf.nn.softmax_cross_entropy_with_logits operation with a plain cross_entropy(normalized_softmax, labels) function (that does not further normalize the output before calculating the loss). I believe this function is not available in the tensorflow library; it needs to be written.
That's it. Feedback is kindly solicited.
Step 1 is more then sufficient if you insert the tf.nn.softmax() in cifar10_eval.py (and not in cifar10.py). For example:
logits = cifar10.inference(images)
normalized_logits = tf.nn.softmax(logits)
top_k_op = tf.nn.in_top_k(normalized_logits, labels, 1)

Derivative of a Function in Modelica

First, excuse me for not providing a minimal working example, it is that I just can't think of one, really. I'll just give some pieces of code and ask my question "in principle".
I'm doing thermophysical properties calculation with a real gas model (Peng-Robinson) and here I am having problems when translating a model, where I use pressure p and specific enthalpy h as inputs to calculate all other properties. When it comes to calculating the temperature T, it is linked to the enthalpy h via an equation called departure function, which is itself a function of T. In Modelica it looks like this:
Dh_real = R_m*T*(Z - 1) + (T*dadT - a)/(sqrt(8)*b)*log((Z + (1 + sqrt(2))*B)/(Z + (1 - sqrt(2))*B));
Here a, dadT and Z are also temperature-dependent scalars and partly calculated using matrix operations (dadT) or polynomial-root-calculation (Z) in functions, b and B are parameters.
Calculating the enthalpy from an input temperature (in another model) is straightforward and working fine, the solver can solve the departure function analytically. The other direction has to be solved numerically and this is, I think, why Dymola gives me this error, when translating.
Cannot find differentiation function:
DadT_Unique2([some parameters and T])
with respect to time
Failed to differentiate the equation
dadT = DadT_Unique2([some parameters and T]);
in order to reduce the DAE index.
Failed to reduce the DAE index.
Now DadT is a function within the model, where I use some simple matrix operations to calculate dadT from some parameters and the temperature T. Obviously, Dymola is in need of the derivative of some internal _Unique2-function.
I couldn't find anything in the specification nor in the web about this. Can I provide a derivative of the functions somehow? I tried the smoothOrder-annotation, but without effect. How can I deal with this?
This is not a full answer, but a list of interesting links that you should read:
Michael Tiller on annotation(derivative=dxyz) and other annotations:
http://book.xogeny.com/behavior/functions/func_annos/#derivative
Claytex on numerical Jacobians and flag Hidden.PrintFailureToDifferentiate:
http://www.claytex.com/blog/how-can-i-make-my-models-run-faster/
Two related questions here on StackOverflow:
Dymola solving stationary equation systems for Media-Model
Two-Phase Modelica Media example
Some related Modelica conference papers:
https://modelica.org/events/Conference2005/online_proceedings/Session1/Session1c2.pdf
http://dx.doi.org/10.3384/ecp15118647
http://dx.doi.org/10.3384/ecp15118653
Cubic equation of state, generalized form (table 4.2)
https://books.google.de/books?id=_Op6DQAAQBAJ&pg=PA187
Solving cubic equations of state:
http://dx.doi.org/10.1002/aic.690480421
https://books.google.com/books?id=dd410GGw8wUC&pg=PA48
https://books.google.com/books?id=1rOA5I6kQ7gC&pg=PA620 (Appendix C)
Rewriting partial derivatives:
https://scholar.google.com/scholar?cluster=3379879976574799663

How to optimize function to get highest coefficient in linear regression?

I am building a typical linear multivariate regression, except that one of variables, rather than being a simple data point, is a function dependent on one of the other variables. So for example, my regression may look like:
y1=c1*x1+c2*x2+c3*x3+c4*f(x3)
f itself contains coefficients a,b,c,d
This particular function is of the form, f(x)=a - b/(1 + e^(-c(x-d)))
Basically, the point of my research is to find which values of a, b, c, and d lead to the highest value of x4, and, hopefully, the best model.
I'm pretty inexperienced in R, but my advisor told me he thinks it would be the best program to get this kind of thing done in... Anyone have any advice on where to start with this problem?
Check out non linear least squares. For R implementation see nls

How to get the predicted values in training data set for Least Squares Support Vector Regression

I would like to make a prediction by using Least Squares Support Vector Machine for Regression, which is proposed by Suykens et al. I am using LS-SVMlab, which you can find the MATLAB toolbox here. Let's consider I have an independent variable X and a dependent variable Y, that both are simulated. I am following the instructions in the tutorial.
>>X = linspace(-1,1,50)’;
>>Y = (15*(X.^2-1).^2.*X.^4).*exp(-X)+normrnd(0,0.1,length(X),1);
>>type = ’function estimation’;
>>[gam,sig2] = tunelssvm({X,Y,type,[], [],’RBF_kernel’},’simplex’,...’leaveoneoutlssvm’,’mse’});
>>[alpha,b] = trainlssvm({X,Y,type,gam,sig2,’RBF_kernel’});
>>plotlssvm({X,Y,type,gam,sig2,’RBF_kernel’},{alpha,b});
The code above finds the best parameters using simplex method and leave-one-out cross validation and trains the model and give me alphas (support vector values for all the data points in the training set) and b coefficients. However, it does not give me the predictions of the variable Y. It only draws the plot. In some articles, I saw plots like the one below,
As I said before, the LS-SVM toolbox does not give me the predicted values of Y, it only draws the plot but no values in the workspace. How can I get these values and draw a graph of predicted values together with actual values?
There is one solution that I think of. By using X values in the training set, I re-run the model and get the prediction of values Y by using simlssvm command but it does not seem reasonable to me. Any solution that you can offer? Thanks in advance.
I am afraid you have answered your own question. The only way to obtain the prediction for the training points in LS-SVMLab is by simulating the training points after training your model.
[yp,alpha,b,gam,sig2,model] = lssvm(x,y,'f')
when u use this function yp is the predicted value

"Reverse" statistics: generating data based on mean and standard deviation

Having a dataset and calculating statistics from it is easy. How about the other way around?
Let's say I know some variable has an average X, standard deviation Y and assume it has normal (Gaussian) distribution. What would be the best way to generate a "random" dataset (of arbitrary size) which will fit the distribution?
EDIT: This kind of develops from this question; I could make something based on that method, but I am wondering if there's a more efficient way to do it.
You can generate standard normal random variables with the Box-Mueller method. Then to transform that to have mean mu and standard deviation sigma, multiply your samples by sigma and add mu. I.e. for each z from the standard normal, return mu + sigma*z.
This is really easy to do in Excel with the norminv() function. Example:
=norminv(rand(), 100, 15)
would generate a value from a normal distribution with mean of 100 and stdev of 15 (human IQs). Drag this formula down a column and you have as many values as you want.
I found a page where this problem is solved in several programming languages:
http://rosettacode.org/wiki/Random_numbers
There are several methods to generate Gaussian random variables. The standard method is Box-Meuller which was mentioned earlier. A slightly faster version is here:
http://en.wikipedia.org/wiki/Ziggurat_algorithm
Here's the wikipedia reference on generating Gaussian variables
http://en.wikipedia.org/wiki/Normal_distribution#Generating_values_from_normal_distribution
I'll give an example using R and the 2nd algorithm in the list here.
X<-4; Y<-2 # mean and std
z <- sapply(rep(0,100000), function(x) (sum(runif(12)) - 6) * Y + X)
plot(density(z))
> mean(z)
[1] 4.002347
> sd(z)
[1] 2.005114
> library(fUtilities)
> skewness(z,method ="moment")
[1] -0.003924771
attr(,"method")
[1] "moment"
> kurtosis(z,method ="moment")
[1] 2.882696
attr(,"method")
[1] "moment"
You could make it a kind of Monte Carlo simulation. Start with a wide random "acceptable range" and generate a few truly random values. Check your statistics and see if the average and variance are off. Adjust the "acceptable range" for the random values and add a few more values. Repeat until you have hit both your requirements and your population sample size.
Just off the top of my head, let me know what you think. :-)
The MATLAB function normrnd from the Statistics Toolbox can generate normally distributed random numbers with a given mu and sigma.
It is easy to generate dataset with normal distribution (see http://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform ).
Remember that generated sample will not have exact N(0,1) distribution! You need to standarize it - substract mean and then divide by std deviation. Then You are free to transform this sample to Normal distribution with given parameters: multiply by std deviation and then add mean.
Interestingly numpy has a prebuilt function for that:
import numpy as np
def generate_dataset(mean, std, samples):
dataset = np.random.normal(mean, std, samples)
return dataset