I am using a naives bayes model for binary classification using a combination of discrete and continous variables. My question is, can I use a different conditional probability distribution (CPD) functions for continuous and discrete observation variables ?
For example, I use gaussian CPD for continous and some deterministic CPD for the discrete variables ?
Thank you
Yes, it is normal to mix continuous and discrete variables within the same model. Consider the following example.
Suppose I have two random variables:
T - the temperature today
D - the day of the week
Note T is continuous and D is discrete. Suppose I want to predict whether John will go to the beach, represented by the binary variable B. Then I could set up my inference as follows, assuming T and D are conditionally independent given B.
p(T|B) • p(D|B) • p(B)
p(B|T,D) = ━━━━━━━━━━━━ ∝ p(T|B) • p(D|B) • p(B)
p(T) • p(D)
p(T|B) could be a Gaussian distribution, p(D|B) could be a discrete distribution, and p(B) could be a discrete prior on how often John goes to the beach.
Related
I am building a regression model to assess how a certain outcome, tracked from 2015-2018, changed in the year 2018 specifically relative to 2015-2017. However, the outcome underwent a natural year-by-year decline that I would also like to capture in the regression model. As a result, I am currently using a variable X as my independent variable (X=0 for 2015-2017 vs. X=1 for 2018), and a variable Y as a confounder (modeled continuously) to adjust for changes across the entire study period (X=0 for 2015, X=1 for 2016, X=2 for 2017, X=3 for 2018).
However, as you can see in the table below, there is a great deal of collinearity between these two variables. One solution would be removing confounder Y from the model but I believe it is important to capture year-by-year change from 2015-2017. Is there an alternative way I can set up this model or an alternative methodology I can use (ex. time series) to perform this analysis? Thank you very much!
I'm using a 1D CNN on temporal data. Let's say that I have two features A and B. The ratio between A and B (i.e. A/B) is important - let's call this feature C. I'm wondering if I need to explicitly calculate and include feature C, or can the CNN theoretically infer feature C from the given features A and B?
I understand that in deep learning, it's best to exclude highly-correlated features (such as feature C), but I don't understand why.
The short answer is NO. Using the standard DNN layers will not automatically capture this A/B relationship, because standard layers like Conv/Dense will only perform the matrix multiplication operations.
To simplify the discussion, let us assume that your input feature is two-dimensional, where the first dimension is A and the second is B. Applying a Conv layer to this feature simply learns a weight matrix w and bias b
y = w * [f_A, f_B] + b = w_A * f_A + w_B * f_B + b
As you can see, there is no way for this representation to mimic or even approximate the ratio operation between A and B.
You don't have to use the feature C in the same way as feature A and B. Instead, it may be a better idea to keep feature C as an individual input, because its dynamic range may be very different from those of A and B. This means that you can have a multiple-input network, where each input has its own feature extraction layers and the resulting features from both inputs can be concatenated together to predict your target.
I am trying to construct a RNN to predict the possibility of a player playing the match along with the runs score and wickets taken by the player.I would use a LSTM so that performance in current match would influence player's future selection.
Architecture summary:
Input features: Match details - Venue, teams involved, team batting first
Input samples: Player roster of both teams.
Output:
Discrete: Binary: Did the player play.
Discrete: Wickets taken.
Continous: Runs scored.
Continous: Balls bowled.
Question:
Most often RNN uses "Softmax" or"MSE" in the final layers to process "a" from LSTM -providing only a single variable "Y" as output. But here there are four dependant variables( 2 Discrete and 2 Continuous). Is it possible to stitch together all four as output variables?
If yes, how do we handle mix of continuous and discrete outputs with loss function?
(Though the output from LSTM "a" has multiple features and carries the information to the next time-slot, we need multiple features at output for training based on the ground-truth)
You just do it. Without more detail on the software (if any) in use it is hard to give more detasmail
The output of the LSTM unit is at every times step on of the hidden layers of your network
You can then input it in to 4 output layers.
1 sigmoid
2 i'ld messarfound wuth this abit. Maybe 4x sigmoid(4 wickets to an innnings right?) Or relu4
3,4 linear (squarijng it is as lso an option,e or relu)
For training purposes your loss function is the sum of your 4 individual losses.
Since f they were all MSE you could concatenat your 4 outputs before calculating the loss.
But sincd the first is cross-entropy (for a decision sigmoid) yould calculate seperately and sum.
You can still concatenate them after to have a output vector
I have a good (or at least a self-consistent) calibration set and have applied PCA and recently PLS regression on n.i.r. spectrum of known mixtures of water and additive to predict the percentage of additive by volume. I thus far have done self-calibration and now want to predict the concentration from the n.i.r.spectrum blindly. Octave returns XLOADINGS, YLOADINGS, XSCORES, YSCORES, COEFFICIENTS, and FITTED with the plsregress command. The "fitted" is the estimate of concentration. Octave uses the SIMPLS approach.
How do I use these returned variables to predict concentration give a new samples spectrum?
Scores are usually denoted by T and loadings by P and X=TP'+E where E is the residual. I am stuck.
Note that T and P are X scores and loadings, respectively. Unlike PCA, PLS has scores and loadings for Y as well (usually denoted U and Q).
While the documentation of plsregress is sketchy at best, the paper it refers to Sijmen de Jong: SIMPLS: an alternativ approach to partial least squares regression Chemom Intell Lab Syst, 1993, 18, 251-263, DOI: 10.1016/0169-7439(93)85002-X
discusses prediction with equations (36) and (37), which give:
Yhat0 = X0 B
Note that this uses centered data X0 to predict centered y-values. B are the COEFFICIENTS.
I recommend that as a first step you predict your training spectra and make sure you get the correct results (FITTED).
So, I have a vector that corresponds to a given feature (same dimensionality). Is there a package in Julia that would provide a mathematical function that fits these data points, in relation to the original feature? In other words, I have x and y (both vectors) and need to find a decent mapping between the two, even if it's a highly complex one. The output of this process should be a symbolic formula that connects x and y, e.g. (:x)^3 + log(:x) - 4.2454. It's fine if it's just a polynomial approximation.
I imagine this is a walk in the park if you employ Genetic Programming, but I'd rather opt for a simpler (and faster) approach, if it's available. Thanks
Turns out the Polynomials.jl package includes the function polyfit which does Lagrange interpolation. A usage example would go:
using Polynomials # install with Pkg.add("Polynomials")
x = [1,2,3] # demo x
y = [10,12,4] # demo y
polyfit(x,y)
The last line returns:
Poly(-2.0 + 17.0x - 5.0x^2)`
which evaluates to the correct values.
The polyfit function accepts a maximal degree for the output polynomial, but defaults to using the length of the input vectors x and y minus 1. This is the same degree as the polynomial from the Lagrange formula, and since polynomials of such degree agree on the inputs only if they are identical (this is a basic theorem) - it can be certain this is the same Lagrange polynomial and in fact the only one of such a degree to have this property.
Thanks to the developers of Polynomial.jl for leaving me just to google my way to an Answer.
Take a look to MARS regression. Multi adaptive regression splines.