sRGB's linear segment to avoid infinite slope - why? - numerical-methods

In this article about sRGB (https://en.wikipedia.org/wiki/SRGB) is stated, that the gamma transformation has a linear portion near zero, to "avoid having an infinite slope at K = 0, which can cause numerical problems". I'd like to know what's the problem with that.

There are two answers, as is usual for gamma. The modern variant is:
The problem is that with an infinite slope you need "infinite" resolution (many bits of storage) in order to arrive at a linear representation that is invertible to gamma-encoded without loss. In other words, it allows for a small lookup table to produce an invertible linear encoding (8bit -> 10 bit -> 8 bit).
The numerical problem is most easily understood on the first step (8 bit -> 10 bit). With an infinite slope near zero, you need a much bigger encoding range to stay faithful and reversible, i.e. you'd need more than 16 bit (assuming integer coding, halfs do not have this problem).
The linear equivalent of #010101 or 1/255th with square (gamma = 2.0) coding is 1/(255*255)th. You would need 16 bits to represent that faithfully, and using 2.2 not 2.0 as an exponent would make it worse. These quite small numbers are just a corollary of the coding function, and in practice you don't need much resolution in a lightness range that is, roughly, black. So the linear segment helps coding by not wasting resolution to detail around black (or near zero).
The slightly older answer, taken from
http://www.poynton.com/notes/colour_and_gamma/GammaFAQ.html#gamma_correction
is that in some equipment this linear segment will be less sensitive to noise. This is probably mostly true of the analog signal path.

Related

Can a neural network having non-linear activation function (say ReLU) be used for linear classification task?

I think the answer would be yes, but I'm unable to reason out a good explanation on this.
The mathematical argument lies in a power to represent linearity, we can use following three lemmas to show that:
Lemma 1
With affine transformations (linear layer) we can map the input hypercube [0,1]^d into arbitrary small box [a,b]^k. Proof is quite simple, we can just make all the biases to be equal to a, and make weights multiply by (b-a).
Lemma 2
For sufficiently small scale, many non-linearities are approximately linear. This is actually very much a definition of a derivative, or, taylor expansion. In particular let us take relu(x), for x>0 it is in fact, linear! What about sigmoid? Well if we look at a tiny tiny region [-eps, eps] you can see that it approaches a linear function as eps->0!
Lemma 3
Composition of affine functions is affine. In other words, if I were to make a neural network with multiple linear layers, it is equivalent of having just one. This comes from the matrix composition rules:
W2(W1x + b1) + b2 = W2W1x + W2b1 + b2 = (W2W1)x + (W2b1 + b2)
------ -----------
New weights New bias
Combining the above
Composing the three lemmas above we see that with a non-linear layer, there always exists an arbitrarily good approximation of the linear function! We simply use the first layer to map entire input space into the tiny part of the pre-activation spacve where your linearity is approximately linear, and then we "map it back" in the following layer.
General case
This is a very simple proof, now in general you can use Universal Approximation Theorem to show that a non-linear neural network (Sigmoid, Relu, many others) that is sufficiently large, can approximate any smooth target function, which includes linear ones. This proof (originally given by Cybenko) is however much more complex and relies on showing that specific classes of functions are dense in the space of continuous functions.
Technically, yes.
The reason you could use a non-linear activation function for this task is that you can manually alter the results. Let's say the range the activation function outputs is between 0.0-1.0, then you can round up or down to get a binary 0/1. Just to be clear, rounding up or down isn't linear activation, but for this specific question the purpose of the network was for classification, where some kind of rounding has to be applied.
The reason you shouldn't is the same reason that you shouldn't attach an industrial heater to a fan and call it a hair-drier, it's unnecessarily powerful and it could potentially waste resources and time.
I hope this answer helped, have a good day!

Any known linear algorithm to approximate a function with line segments?

I have a function given by a list of points, ex:
f = [0.03, 0.05, 0.02, 1.3, 1.0, 5.6, ..., 13.4, 12.45]
I need an algorithm (with linear complexity) to "cut" this function/list into K intervals/sublists so that each interval/sublist contains points that "lie near a line segment" (take a look at the image)
The number K may be decided either by the algorithm itself or be a parameter of the algorithm. (preferable is to be decided by the algorithm itself)
Is there such a known algorithm I could use ?
i am writing with smartphone so this is short. Basically a function is nearly linear if the difference between two consecutive values is approximately equal see http://psn.virtualnerd.com/viewtutorial/PreAlg_13_01_0006
As an algorithm for traversing an unsorted array Sliding Window is nice ( https://www.geeksforgeeks.org/window-sliding-technique/ ) and can be implemented by a single pass (1-pass solution)
Update because comment :
So with a sliding window you can implement the vagueness or fuzziness of the values you mentioned in the comment this is why nearly linear and approximately, i.e.
if(abs(abs(x[i]-x[i+1]) - abs(x[i+1]-x[i+2])) < 0.5)
{linearity_flag=1;}
else
{linearity_flag=0;}
where x[i]-x[i+1] and x[i+1]-x[i+2] are two consecutive differences of two consecutive values and 0.5 is a deliberately chosen threshold that fixes what you define as a straight line or linear function in an x-y graph (or what 'jittering' of the line you allow). So you have to use the difference of differences of consecutive values. Instead of 3 points you can also include more points with this approach (sliding window)
If you want a strict mathematical ansatz you could use other curve analysis techniques : https://openstax.org/books/calculus-volume-1/pages/4-5-derivatives-and-the-shape-of-a-graph (actually the difference of differences of consecutive values is a discrete realization of a 2nd derivative)

Best techinique to approximate a 32-bit function using machine learning?

I was wondering which is the best machine learning technique to approximate a function that takes a 32-bit number and returns another 32-bit number, from a set of observations.
Thanks!
Multilayer perceptron neural networks would be worth taking a look at. Though you'll need to process the inputs to a floating point number between 0 and 1, and then map the outputs back to the original range.
There are several possible solutions to your problem:
1.) Fitting a linear hypothesis with least-squares method
In that case, you are approximating a hypothesis y = ax + b with the least squares method. This one is really easy to implement, but sometimes, a linear model is not good enough to fit your data. But - I would give this one a try first.
Good thing is that there is a closed form, so you can directly calculate parameters a and b from your data.
See Least Squares
2.) Fitting a non-linear model
Once seen that your linear model does not describe your function very well, you can try to fit higher polynomial models to your data.
Your hypothesis then might look like
y = ax² + bx + c
y = ax³ + bx² + cx + d
etc.
You can also use least squares method to fit your data, and techniques from the gradient descent types (simmulated annealing, ...). See also this thread: Fitting polynomials to data
Or, as in the other answer, try fitting a Neural Network - the good thing is that it will automatically learn the hypothesis, but it is not so easy to explain what the relation between input and output is. But in the end, a neural network is also a linear combination of nonlinear functions (like sigmoid or tanh functions).

why DCT transform is preferred over other transforms in video/image compression

I went through how DCT (discrete cosine transform) is used in image and video compression standards.
But why DCT only is preferred over other transforms like dft or dst?
Because cos(0) is 1, the first (0th) coefficient of DCT-II is the mean of the values being transformed. This makes the first coefficient of each 8x8 block represent the average tone of its constituent pixels, which is obviously a good start. Subsequent coefficients add increasing levels of detail, starting with sweeping gradients and continuing into increasingly fiddly patterns, and it just so happens that the first few coefficients capture most of the signal in photographic images.
Sin(0) is 0, so the DSTs start with an offset of 0.5 or 1, and the first coefficient is a gentle mound rather than a flat plain. That is unlikely to suit ordinary images, and the result is that DSTs require more coefficients than DCTs to encode most blocks.
The DCT just happens to suit. That is really all there is to it.
When performing image compression, our best bet is to perform the KLT or the Karhunen–Loève transform as it results in the least possible mean square error between the original and the compressed image. However, KLT is dependent on the input image, which makes the compression process impractical.
DCT is the closest approximation to the KL Transform. Mostly we are interested in low frequency signals so only even component is necessary hence its computationally feasible to compute only DCT.
Also, the use of cosines rather than sine functions is critical for compression as fewer cosine functions are needed to approximate a typical signal (See Douglas Bagnall's answer for further explanation).
Another advantage of using cosines is the lack of discontinuities. In DFT, since the signal is represented periodically, when truncating representation coefficients, the signal will tend to "lose its form". In DCT, however, due to the continuous periodic structure, the signal can withstand relatively more coefficient truncation but still keep the desired shape.
The DCT of a image macroblock where the top and bottom and/or the left and right edges don't match will have less energy in the higher frequency coefficients than a DFT. Thus allowing greater opportunities for these high coefficients to be removed, more coarsely quantized or compressed, without creating more visible macroblock boundary artifacts.
DCT is preferred over DFT (Discrete Fourier Transformation) and KLT (Karhunen-Loeve Transformation)
1. Fast algorithm
2. Good energy compaction
3. Only real coefficients

how to get max. value for a non linear data

I am a new Matlab user..so quite unfamilier with most of its power...Actually I need to get the maximum value in a non linear moment curvature curve...I define the theoretical max. and min. curvature values in the program and then divide it in small discrete increments...but the problem is...the max. value sometimes occur in between two increments...so the program misses that one...and it stops before finding the max. value...Please help me...how can I overcome this problem
You will need to approximate the curve, using an interpolation/fitting scheme that depends on the problem and the curve shape, and the known functional form. A spline might be appropriate, or perhaps not.
Once you have a viable approximation that connects the dots so to speak, you minimize/maximize that function. This is an easily solved problem at that point.
There is a method to solve non linear functions (find minima/maxima)
It uses least squares non linear method and I think is called lsqnonlin(). Find it in optimization toolbox. Also solve() might work. Another option is to use simulated annealing but I don't remember the name of the function.
Sorry I don't supply code. I am answering from iphone