FFT outptut for a signal with 2 cosf() cycles - fft

I am transforming a signal using the ZeroFFT library. The results I get from it, are not what I would intuitively expect.
As a test, I feed the FFT algorithm with a buffer that contains two full cycles of cosine:
Which is sampled over 512 samples.
Which are fed as int16_t values. What I expected to get back, is 256 amplitudes, with the values [ 0, 4095, 0, 0, ..., 0 ].
Instead, this is the result:
2 2052 4086 2053 0 2 2 2 1 2 2 2 4 4 3 4...
And it gets weirder! If I feed it the same signal, but shifted (so sinf() over 0 .. 4*pi instead of cosf() function) I get a completely different result: 4 10 2 16 2 4 4 4 2 2 2 3 2 4 3 4
This throws up the questions:
1. Doesn't a sine signal and cosine signal with same period, contain exactly the same frequencies?
2. If I feed it a buffer with exactly 2 cycles of cosine, wouldn't the Fourier transform result in all zeros, except for 1 frequency?
I generate my test signal as:
static void setup_reference(void)
{
for (int i=0; i<CAPTURESZ; ++i)
{
const float phase = 2 * 3.14159f * i / (256.0f);
reference_wave[i] = (int16_t) (cosf(phase) * 0x7fff);
}
}
And call the ZeroFFT function as:
ZeroFFT(reference_Wave, CAPTURESZ);
Note: the ZeroFFT docs state that a Hanning window is applied.

Windowing causes some spectral leakage. Including the window function, the wave now looks like this:
If I feed it a buffer with exactly 2 cycles of cosine, wouldn't the Fourier transform result in all zeros, except for 1 frequency?
Yes, if you do it without windowing. Actually two frequencies: both the positive frequency that you expect, and the equivalent negative frequency, though not all FFT functions will include the negative frequencies in their output (for Real input, the result is Hermitian-symmetric, there is no extra information in the negative frequencies). For practical reasons, since neither the input signal nor the FFT calculation are exact, you may not get exactly zero everywhere else either, but it should be close - that's mainly a concern for floating point output.
By the way by this I don't mean that windowing is bad, but in this special case (perfectly periodic input) it didn't work out in your favour.
As for the sine wave, the magnitudes of the result should be similar (within reason - exactness shouldn't be expected), but the comments on the FFT function you used mention
The complex portion is discarded, and the real values are returned.
While phase shifts would not change the magnitudes much, they change the phases of the results, and therefore also their Real component.

Related

LC-3 algorithm for converting ASCII strings to Binary Values

Figure 10.4 provides an algorithm for converting ASCII strings to binary values. Suppose the decimal number is arbitrarily long. Rather than store a table of 10 values for the thousands-place digit, another table for the 10 ten-thousands-place digit, and so on, design an algorithm to do the conversion without resorting to any tables whatsoever.
I have attached pictures of figure 10.4. I am not looking for an answer to the problem, but rather can someone please explain this problem and perhaps give some direction on how to go about creating the algorithm?
Figure 10.4
Figure 10.4 second image
I am unsure as to what it means by tables and do not know where to start really.
The tables are those global, initialized arrays: one called Lookup10 holding 10, 20, 30, 40, ..., and another called Lookup100 holding 100, 200, 300, 400...
You can ignore the tables: as per the assignment instructions you're supposed to find a different way to accomplish this anyway.  Or, you can run that code in simulator or mentally to understand how it works.
The bottom line is that LC-3, while it can do anything (it is turning complete), it can't do much in any one instruction.  For arithmetic & logic, it can do add, not, and.  That's pretty much it!  But that's enough — let's note that modern hardware does everything with only one logic gate, namely NAND, which is a binary operator (so NAND directly available; NOT by providing NAND with the same operand for both inputs; AND by doing NOT after NAND; OR using NOT on both inputs first and then NAND; etc..)
For example, LC-3 cannot multiply or divide or modulus or right shift directly — each of those operations is many instructions and in the general case, some looping construct.  Multiplication can be done by repetitive addition, and division/modulus by repetitive subtraction.  These are super inefficient for larger operands, and there are much more efficient algorithms that are also substantially more complex, so those greatly increase program complexity beyond that already with the repetitive operation approach.
That subroutine goes backwards through the use input string.  It takes a string length count in R1 as parameter supplied by caller (not shown).  It looks at the last character in the input and converts it from an ASCII character to a binary number.
(We would commonly do that conversion from ascii character to numeric value using subtraction: moving the character values from the ascii character range of 0x30..0x39 to numeric values in the range 0..9, but they do it with masking, which also works.  The subtraction approach integrates better with error detection (checking if not a valid digit character, which is not done here), whereas the masking approach is simpler for LC-3.)
The subroutine then obtains the 2nd last digit (moving backwards through the user's input string), converting that to binary using the mask approach.  That yields a number between 0 and 9, which is used as an index into the first table Lookup10.  The value obtained from the table at that index position is basically the index × 10.  So this table is a × 10 table.  The same approach is used for the third (and first or, last-going-backwards) digit, except it uses the 2nd table which is a × 100 table.
The standard approach for string to binary is called atoi (search it) standing for ascii to integer.  It moves forwards through the string, and for every new digit, it multiples the existing value, computed so far, by 10 before adding in the next digit's numeric value.
So, if the string is 456, the first it obtains 4, then because there is another digit, 4 × 10 = 40, then + 5 for 45, then × 10 for 450, then + 6 for 456, and so on.
The advantage of this approach is that it can handle any number of digits (up to overflow).  The disadvantage, of course, is that it requires multiplication, which is a complication for LC-3.
Multiplication where one operand is the constant 10 is fairly easy even in LC-3's limited capabilities, and can be done with simple addition without looping.  Basically:
n × 10 = n + n + n + n + n + n + n + n + n + n
and LC-3 can do those 9 additions in just 9 instructions.  Still, we can also observe that:
n × 10 = n × 8 + n × 2
and also that:
n × 10 = (n × 4 + n) × 2     (which is n × 5 × 2)
which can be done in just 4 instructions on LC-3 (and none of these needs looping)!
So, if you want to do this approach, you'll have to figure out how to go forwards through the string instead of backwards as the given table version does, and, how to multiply by 10 (use any one of the above suggestions).
There are other approaches as well if you study atoi.  You could keep the backwards approach, but now will have to multiply by 10, by 100, by 1000, a different factor for each successive digit .  That might be done by repetitive addition.  Or a count of how many times to multiply by 10 — e.g. n × 1000 = n × 10 × 10 × 10.

Can LSTM train for regression with different numbers of feature in each sample?

In my problem, each training and testing sample has different number of features. For example, the training sample is as following:
There are four features in sample1: x1, x2, x3, x4, y1
There are two features in sample2: x6, x3, y2
There are three features in sample3: x8, x1, x5, y3
x is feature, y is target.
Can these samples train for the LSTM regression and make prediction?
Consider following scenario: you have a (way to small) dataset of 6 sample sequences of lengths: { 1, 2, 3, 4, 5, 6} and you want to train your LSTM (or, more general, an RNN) with minibatch of size 3 (you feed 3 sequences at a time at every training step), that is, you have 2 batches per epoch.
Let's say that due to randomization, on step 1 batch ended up to be constructed from sequences of lengths {2, 1, 5}:
batch 1
----------
2 | xx
1 | x
5 | xxxxx
and, the next batch of sequences of length {6, 3, 4}:
batch 2
----------
6 | xxxxxx
3 | xxx
4 | xxxx
What people would typically do, is pad sample sequences up to the longest sequence in the minibatch (not necessarily to the length of the longest sequence overall) and to concatenate sequences together, one on top of another, to get a nice matrix that can be fed into RNN. Let's say your features consist of real numbers and it is not unreasonable to pad with zeros:
batch 1
----------
2 | xx000
1 | x0000
5 | xxxxx
(batch * length = 3 * 5)
(sequence length 5)
batch 2
----------
6 | xxxxxx
3 | xxx000
4 | xxxx00
(batch * length = 3 * 6)
(sequence length 6)
This way, for the first batch your RNN will only run up to necessary number of steps (5) to save some compute. For the second batch it will have to go up to the longest one (6).
The padding value is chosen arbitrarily. It usually should not influence anything, unless you have bugs. Trying some bogus values, like Inf or NaN may help you during debugging and verification.
Importantly, when using padding like that, there are some other things to do for model to work correctly. If you are using backpropagation, you should exclude the results of the padding from both, output computation and gradient computation (deep learning frameworks will do that for you). If you are running a supervised model, labels should typically also be padded and padding should not be considered for the loss calculation. For example, you calculate cross-entropy for the entire batch (with padding). In order to calculate a correct loss, the bogus cross-entropy values that correspond to padding should be masked with zeros, then each sequence should be summed independently and divided by its real length. That is, averaging should be performed without taking padding into account (in my example this is guaranteed due to the neutrality of zero with respect to addition). Same rule applies to regression losses and metrics such as accuracy, MAE etc (that is, if you average together with padding your metrics will also be wrong).
To save even more compute, sometimes people construct batches such that sequences in batches have roughly the same length (or even exactly the same, if dataset allows). This may introduce some undesired effects though, as long and short sequences are never in the same batch.
To conclude, padding is a powerful tool and if you are attentive, it allows you to run RNNs very efficiently with batching and dynamic sequence length.
Yes. Your input_size for LSTM-layer should be maximal among all input_sizes. And spare cells you replace with nulls:
max(input_size) = 5
input array = [x1, x2, x3]
And you transform it this way:
[x1, x2, x3] -> [x1, x2, x3, 0, 0]
This approach is rather common and does not show any negative big influence on prediction accuracy.

Ternary computers: what would the third (unknown) part of a trit be used for?

I'm very interested in the idea of creating/designing (but most likely only imagining) a ternary computer rather than a binary computer.
If I were to do this, I would used a balanced base-3 system, so a trit (trit is to base-3 as bit is to base-2) could be -1, 0, or +1. Storing data using trits would be approximately 36% more compact than storing data using bits like we do on today's computers, however ternary arithmetic would be much more complicated so there's no telling whether an ALU using ternary-computing would be faster or slower than binary.
But I digress, that's just a little background stuff that doesn't entirely pertain to the question, but it is related. :)
So, possible values for a trit:
-1 is off/false, same as 0 in binary.
0 is unknown. No equivalent in binary.
+1 is on/true, same as 1 in binary.
My question is...what is the point of that 0 in terms of computing? For example, I've been reading up a lot on logic gates and I understand both how they work and how they can work together to create an ALU. A binary AND gate is very simple, and combined with other binary logic gates they could all be used in combination to perform arithmetic such as creating an Adder, or a unit that performs addition.
I can't even comprehend how this would be done using ternary-computing. How would the unknown (0) factor into the logic gates and be used to perform arithmetic? Hell, I can't even comprehend what outputs a ternary AND gate would put out and how they'd be used.
For example, I would assume for a ternary computer an AND gate would accept 3 inputs instead of 2. Let's call the inputs A, B, and C.
In a binary AND gate, A and B can be 0 or 1. There are four possible combinations of what A and B could be input into the AND gate as. This results in only four possible outputs from the AND gate. If A and B are both 1 then the AND gate outputs a 1. If it is any of the other three combinations of A and B then it outputs 0. (possible results from AND gate considering all possible A/B combinations: 0, 0, 0, 1)
A ternary AND gate would take in 3 inputs, right? So in a ternary AND gate, A, B, and C could be -1, 0, or 1. This means that there are 27 possible combinations for A/B/C. Rather than listing out the possible outcomes for each combination, I'll just add them up for you guys. :)
Anyway, there is only one combination in which 1 will be the output, there are 7 combinations in which 0 will be the output (assuming if A, B, and C are all 0 the AND gate will output 0), and there are 19 combinations in which -1 will be the output. In a binary Adder if the gate throws a 1 it would be sent off to another gate to be evaluated and so on until the addition is complete. In ternary...what would a gate do if it received a 0?
I know that's a lot of reading, so I'll try to sum it up and list the main questions below:
How would the 0 of a trit in a balanced ternary system be used/handled in logic gates?
If a logic gate outputs a 0, and the gate is being used in an ALU to perform arithmetic (let's say addition for example), how would the gate that receives the 0 be expected to handle it? Basically, how would one go about creating an Adder using ternary logic?
And lastly, am I correct assuming that in a ternary computer logic gates would accept 3 inputs instead of 2 like binary computers, or would logic gates still be dyadic?
An essential goal of ALU design is to perform arithmetic on integers. In the first place addition (subtraction), then multiplication and division.
When written in base 3, these operations are well defined. For instance
+ | 0 1 2
------------
0 | 0 1 2
1 | 1 2 10
2 | 2 10 11
As with binary arithmetic, on needs to compute a sum trit and a carry. When the carry is propagated, the following table applies
+c| 0 1 2
------------
0 | 1 2 10
1 | 2 10 11
2 | 10 11 12
So you indeed need two three-input functions (two trits in and a carry in), giving the sum trit and the carry out bit. (Notice that binary ALUs add the same way: two bits in and a carry in giving a sum and a carry out bit.)
Whether this can be implemented from elementary dyadic or triadic gates would be technology dependent.
The logical predicates AND/OR have no reason to be modified and should remain binary. Boolean arithmetic remains Boolean.
Besides, if you enumerate all ternary functions of two ternary arguments (i.e. 9 input combinations), you find 19683 of them. Contrast this to 16 binary functions. This mess is unmanageable. (Don't even think of all triadic ternary functions, 7625597484987 of them.)
Okay, so I believe I may have found the answer to my questions.
So, there's no reason to reinvent the wheel, using standard binary logic gates would work just fine in a ternary computer.
In binary, 0 is false and 1 is true, and a number in binary is to be read from right to left, each digit following a power of two. For example, 10010 would be 18 (2 + 16). This number system is incremental, meaning the more 1's you have to the left the higher the number would be if converted to decimal, but there's no way to decrement with this system. All of this is done using transistors that make use of only caring about whether there is voltage flowing through it or if there isn't, thus determining if there is voltage it is on, and the bit is a 1, and if not then it is off, and the bit is a 0.
In ternary, there wouldn't be an on/off. Unlike the standard transistor in binary computers that tests whether there is or isn't voltage to determine the bit's value, a transistor for a ternary computer would test for if the voltage is negative, positive, or a ground. (-1 being negative voltage, 0 being the ground, +1 being positive voltage).
Using this system, decimal numbers would be written in ternary similarly to how a decimal number would be written in binary with one exception: ternary allows for decrementation. Take a number in binary for example, with the first digit starting on the right the digits correlate to 2^0, then 2^1, and so on. In binary all the digits with 1's and their correlating power of 2 is then added up to give you your number.
Now imagine ternary. From right-to-left it would follow 3^0, 3^1, 3^2, and so on. however, a +1 trit would indicate that the power of 3 that the digit correlates to is positive, a 0 trit would mean that digit is ignored just as a 0 does the same in binary, and a -1 trit would mean that the power of 3 that the digit correlates to is negative. This allows for both incrementation and decrementation.
Take this ternary number for example: (I'm going to use '-' instead of '-1', and '+' to represent '+1': +-0+-
this would be, reading from left to right, (-1)+(3)+(-27)+(81) = 56
While it's true that ternary and dyadic logic gates are compatible, an ALU would have to be designed very differently. Basically that's it. :)

Reduction of odd number of elements CUDA

It seems that it possible to make reduction only for odd number of elements. For example, it needs to sum up numbers. When I have even number of elements, it will be like this:
1 2 3 4
1+2
3+3
6+4
But what to do when I have, for instance 1 2 3 4 5? The last iteration is the sum of three elements 6+4+5 or what? I saw the same question here, but couldn't find the answer.
A parallel reduction will add pairs of elements first:
1 1+3 4+6
2 2+4
3
4
Your example with an odd number of elements would typically be realized as:
1 1+4 5+3 8+7
2 2+5 7+0
3 3+0
4 0+0
5
0
0
0
That is to say, typically a parallel reduction will work with a power-of-2 set of threads, and at most one threadblock (the last one) will have less than a full complement of data to work with. The usual method to handle this is to zero-pad the data out to the threadblock size. If you study the cuda parallel reduction sample code, you'll find examples of this.

For-Loop for finding combinations of springs?

I need to use a for-loop in a function in order to find spring constants of all possible combinations of springs in series and parallel. I have 5 springs with data therefore I found the spring constant (K) of each in a new matrix by using polyfit to find the slope (using F=Kx).
I have created a function that does so, however it returns data not in a matrix, but as individual outputs. So instead of KP (Parallel)= [1 2 3 4 5] it says KP=1, KP=2, KP=3, etc. Because of this, only the final output is stored in my workspace. Here is the code I have for the function. Keep in mind that the reason I need to use the +2 in the for loop for b is because my original matrix K with all spring constants is ten columns, with every odd number being a 0. Ex: K=[1 0 2 0 3 0 4 0 5] --- This is because my original dataset to find K (slope) was ten columns wide.
function[KP,KS]=function_name1(K)
L=length(K);
c=1;
for a=1:2:L
for b=a+2:2:L
KP=K(a)+K(b)
KS=1/((1/K(a))+(1/K(b)))
end
end
c=c+1;
and then a program calling that function
[KP,KS]=function_name1(K);
What I tried: - Suppressing and unsuppressing lines of code (unsuccessful)
Any help would be greatly appreciated.
hmmm...
your code seems workable, but you aren't dealing with things in the most practical manner
I'd start be redimensioning K so that it makes sense, that is that it's 5 spaces wide instead of your current 10 - you'll see why in a minute.
Then I'd adjust KP and KS to the size that you want (I'm going to do a 5X5 as that will give all the permutations - right now it looks like you are doing some triangular thing, I wouldn't worry too much about space unless you were to do this for say 50,000 spring constants or so)
So my code would look like this
function[KP,KS]=function_name1(K)
L=length(K);
KP = zeros(L);
KS = zeros(l);
c=1;
for a=1:L
for b=1:L
KP(a,b)=K(a)+K(b)
KS(a,b)=1/((1/K(a))+(1/K(b)))
end
end
c=c+1;
then when you want the parallel combination of springs 1 and 4 KP(1,4) or KP(4,1) will do the trick