I use Albumentations augmentations in my computer vision tasks. However, I don't fully understand when to use normalization on my images (I use min-max normalization). Do I need to use normalization before augmentation functions, but values would not be between 0-1, or do I use normalization just after augmentations, so that values are between 0-1, or I use normalization in both cases - before and after augmentations?
For example, when I use Sharpen, values are not in 0-1 range (they vary in -0.5-1.5 range). Does that affect model performance? If yes, how?
Thanks in advance.
The basic idea is that you should have the input of your neural network around 0 and with a variance of 1. There is a mathematical reason why it helps the learning process of neural network. This is not the case for other algorithms like tree boosting.
If you train from scratch the type of normalization (min max or other) should not impact the model performance (except if, for exemple your max/min value is really extrem compare to your other data point).
Related
I’m working with different types of financial data inputs for my models and I would like to know more about normalization of them.
In particular, working with some technical indicators, I’ve normalized them to have a range between 0 and 1.
Others were normalized to have a range between -1 and 1.
What is your experience with mixed normalized data?
Could it be acceptable to have these two ranges or is it always better to have the training dataset with a single range i.e. [0 1]?
It is important to note that when we discuss data normalization, we are usually referring to the normalization of continuous data. Categorical data (usually) doesn't require the former.
Furthermore, not all ML methods need you to normalize data for them to function well. Examples of such methods include Random Forests and Gradient Boosting Machines. Others, however, do. For instance, Support Vector Machines and Neural Networks.
The reasons for input data normalization are dependent on the methods themselves. For SVMs, data normalization is done to ensure that input features are given equal importance in influencing the model's decisions. For neural networks, we normalize data to allow the gradient descent process to converge smoothly.
Finally, to answer your question, if you are working with continuous data and using a neural network to model your data, just make sure that the normalized data's values are close to each other (even if they are not the same range) because that is what determines the ease with which the gradient descent process converges. If you are working with an SVM, it would be better to normalize your data to a single range, so that all features may be given equal importance by the similarity/ distance function that your SVM uses. In other cases, the need for data normalization, whatever the ranges, may be removed entirely. Ultimately, it depends on the modeling technique you are using!
Credit to #user3666197 for the helpful feedback in the comments.
I'm working on a regression problem in pytorch. My target values can be either between 0 to 100 or 0 to 1 (they represent % or % divided by 100).
The data is unbalanced, I have much more data with lower targets.
I've noticed that when I run the model with targets in the range 0-100, it doesn't learn - the validation loss doesn't improve, and the loss on the 25% large targets is very big, much bigger than the std in this group.
However, when I run the model with targets in the range 0-1, it does learn and I get good results.
If anyone can explain why this happens, and if using the ranges 0-1 is "cheating", that will be great.
Also - should I scale the targets? (either if I use the larger or the smaller range).
Some additional info - I'm trying to fine tune bert for a specific task. I use MSEloss.
Thanks!
I think your observation relates to batch normalization. There is a paper written on the subject, an numerous medium/towardsdatascience posts, which i will not list here. Idea is that if you have a no non-linearities in your model and loss function, it doesn't matter. But even in MSE you do have non-linearity, which makes it sensitive to scaling of both target and source data. You can experiment with inserting Batch Normalization Layers into your models, after dense or convolutional layers. In my experience it often improves accuracy.
I have a dataset with both binary data (0,1) and numeric data with different units. If I want to apply some machine learning techniques to classify my data (potentially autoencoder or hierarchy clustering), should I standardize or normalize the data?
Thank you!
It depends.
For neural networks you may want to standardize continuous variables for numerical reasons. But it depends on your platform. Consider Googles TPUs: they work with 1 byte precision, so you want the relevant input domain to use this limited range optimally.
For distance based methods like clustering, preprocessing the data is crucial, but difficult. It is false that standardizing is always the right thing to do. But it is fairly common to apply some normalization. But you need a domain expert to find the best normalization.
My question is, how I can make a cluster analysis from spatial - temporal and high dimensional data? my purpose is to find subspace clusters that can show patterns in the space and in the time. over here space mean a geographic position, so I should use autocorrelation law (also knowns like Tobler law or the first law from geography).
is this right?, first I make a transformation from time to frequency through Wavelets transform from every variable (because all variables have time and geographic position related) and after that, taking that coefficients and applying one subspace clustering algorithm for temporal high-dimensional clustering. once I have the temporal clusters I try to find a spatial "cluster" trough regionalization between temporal clusters.
Thanks in Advance any light.
I understand that you use the toblers law as an interpretation of the spatial correlation (regionalization). Its not clear what the final application would be but, a few verification steps i would do in such circumstances would be: to check if the all(150) variables are all corresponding to the same scale in space and time, affected by the same kind of autocorrelation (stationarity) which can simplify the problems in few cases. And finally also has to understand what features or patterns are to be extracted and how they are characterized. Check this out: http://www.geokernels.org/pages/modern_indexpag.html
Hope it helped !
Cheers
Ravi
Its not clear what you would like to achieve here. In general for spatio temporal clustering one could use a distribution based model like a multivariate Guassian Mixture Model for a given patch in the Dataset, and update the covariance matrice parameters (http://en.wikipedia.org/wiki/Multivariate_normal_distribution) - In case of the Wavelet transform coefficient clustering we ignore any spatial correlation to exist.
I am not sure by what you mean here by "regionalization"
You could treat time as just another dimension, depending on your application.
What about constructing a temporal cluster data with a correlation coefficient against cluster which gives a variance equal to 1. A spatial cluster will be a scatter plot which obviously might derive from lognormal, skewed and regression plots.
I recently read that you can predict the outcomes of a PRNG if you:
Know what algorithm is being used.
Have consecutive data points.
Is it possible to figure out the seed used for a PRNG from only data points?
I managed to find a paper by Kelsey et al which details the different types of attack and also summarises some real-world examples. It seems most attacks rely on similar techniques to those against cryptosystems, and in most cases actually taking advantage of the fact that the PRNG is used in a cryptosystem.
With "enough" data points that are the absolute first data points generated by the PRNG with no gaps, sure. Most PRNG functions are invertible, so just work backwards and you should get the seed.
For example, the typical return seed=(seed*A+B)%N has an inverse of return seed=((seed-B)/A)%N.
It's always theoretically possible, if you're "allowed" to brute force all possible values for the seed, and if you have enough data points that there's only one seed that could have produced that output. If the PRNG was seeded with the time, and you know roughly when that happened, then this might be very fast since there aren't many plausible values to try. If the PRNG was seeded with data from a truly random source having 64 bits of entropy, then this approach is computationally infeasible.
Whether there are other techniques depends on the algorithm. For example doing this for Blum Blum Shub is equivalent to integer factorization, which is generally believed to be a hard computational problem. Other, faster PRNGs might be less "secure" in this sense. Any PRNG used for crypto purposes, for example in a stream cipher, pretty much needs there to be no known feasible way of doing it.