FFT pitch detection for guitar string - fft

I have simply pitch detection. Input (microphone) data are passed to fft routine, then I'm looking for a pitch with maximum value It means:
Max(pow(data[i].getRe(), 2) + pow(data[i].getIm(), 2)) for 0<= i < SAmplesSize
I need it for detection of guitar string's primary frequency. It works well for freq 440 hz (and maybe higher, i didn't check that) downto 250 hz. Below this value detected frequency is twice as high as it should be, ie. for 195 hz detected frequency is about 380 hz. It looks like it detects higher harmonics below 250 hz. For pure 195 hz tone it detects perfectly, but for quitar string something is wrong.
Any suggestion what can cause that ? Or should I use more sophisticated pitch detection?
Ps. sampling rate: 8000hz, input data size: 1024

I don't know about guitars specifically, but missing fundamentals seem to be quite common in acoustics. The Wikipedia page on pitch detection alludes to secondary processing steps after the FFT, perhaps one of these would be helpful.
Also, see these two SO questions, lots of good information there: (1), (2).

Related

Statistical method to know when enough performance test iterations have been performed

I'm doing some performance/load testing of a service. Imagine the test function like this:
bytesPerSecond = test(filesize: 10MB, concurrency: 5)
Using this, I'll populate a table of results for different sizes and levels of concurrency. There are other variables too, but you get the idea.
The test function spins up concurrency requests and tracks throughput. This rate starts off at zero, then spikes and dips until it eventually stabilises on the 'true' value.
However it can take a while for this stability to occur, and there are lot of combinations of input to evaluate.
How can the test function decide when it's performed enough samples? By enough, I suppose I mean that the result isn't going to change beyond some margin if testing continues.
I remember reading an article about this a while ago (from one of the jsperf authors) that discussed a robust method, but I cannot find the article any more.
One simple method would be to compute the standard deviation over a sliding window of values. Is there a better approach?
IIUC, you're describing the classic problem of estimating the confidence interval of the mean with unknown variance. That is, suppose you have n results, x1, ..., xn, where each of the xi is a sample from some process of which you don't know much: not the mean, not the variance, and not the distribution's shape. For some required confidence interval, you'd like to now whether n is large enough so that, with high probability the true mean is within the interval of your mean.
(Note that with relatively-weak conditions, the Central Limit Theorem guarantees that the sample mean will converge to a normal distribution, but to apply it directly you would need the variance.)
So, in this case, the classic solution to determine if n is large enough, is as follows:
Start by calculating the sample mean μ = ∑i [xi] / n. Also calculate the normalized sample variance s2 = ∑i [(xi - μ)2] / (n - 1)
Depending on the size of n:
If n > 30, the confidence interval is approximated as μ ± zα / 2(s / √(n)), where, if necessary, you can find here an explanation on the z and α.
If n < 30, the confidence interval is approximated as μ ± tα / 2(s / √(n)); see again here an explanation of the t value, as well as a table.
If the confidence is enough, stop. Otherwise, increase n.
Stability means rate of change (derivative) is zero or close to zero.
The test function spins up concurrency requests and tracks throughput.
This rate starts off at zero, then spikes and dips until it eventually
stabilises on the 'true' value.
I would track your past throughput values. For example last X values or so. According to this values, I would calculate rate of change (derivative of your throughput). If your derivative is close to zero, then your test is stable. I will stop test.
How to find X? I think instead of constant value, such as 10, choosing a value according to maximum number of test can be more suitable, for example:
X = max(10,max_test_count * 0.01)

understanding getByteTimeDomainData and getByteFrequencyData in web audio

The documentation for both of these methods are both very generic wherever I look. I would like to know what exactly I'm looking at with the returned arrays I'm getting from each method.
For getByteTimeDomainData, what time period is covered with each pass? I believe most oscopes cover a 32 millisecond span for each pass. Is that what is covered here as well? For the actual element values themselves, the range seems to be 0 - 255. Is this equivalent to -1 - +1 volts?
For getByteFrequencyData the frequencies covered is based on the sampling rate, so each index is an actual frequency, but what about the actual element values themselves? Is there a dB range that is equivalent to the values returned in the returned array?
getByteTimeDomainData (and the newer getFloatTimeDomainData) return an array of the size you requested - its frequencyBinCount, which is calculated as half of the requested fftSize. That array is, of course, at the current sampleRate exposed on the AudioContext, so if it's the default 2048 fftSize, frequencyBinCount will be 1024, and if your device is running at 44.1kHz, that will equate to around 23ms of data.
The byte values do range between 0-255, and yes, that maps to -1 to +1, so 128 is zero. (It's not volts, but full-range unitless values.)
If you use getFloatFrequencyData, the values returned are in dB; if you use the Byte version, the values are mapped based on minDecibels/maxDecibels (see the minDecibels/maxDecibels description).
Mozilla 's documentation describes the difference between getFloatTimeDomainData and getFloatFrequencyData, which I summarize below. Mozilla docs reference the Web Audio
experiment ; the voice-change-o-matic. The voice-change-o-matic illustrates the conceptual difference to me (it only works in my Firefox browser; it does not work in my Chrome browser).
TimeDomain/getFloatTimeDomainData
TimeDomain functions are over some span of time.
We often visualize TimeDomain data using oscilloscopes.
In other words:
we visualize TimeDomain data with a line chart,
where the x-axis (aka the "original domain") is time
and the y axis is a measure of a signal (aka the "amplitude").
Change the voice-change-o-matic "visualizer setting" to Sinewave to
see getFloatTimeDomainData(...)
Frequency/getFloatFrequencyData
Frequency functions (GetByteFrequencyData) are at a point in time; i.e. right now; "the current frequency data"
We sometimes see these in mp3 players/ "winamp bargraph style" music players (aka "equalizer" visualizations).
In other words:
we visualize Frequency data with a bar graph
where the x-axis (aka "domain") are frequencies or frequency bands
and the y-axis is the strength of each frequency band
Change the voice-change-o-matic "visualizer setting" to Frequency bars to see getFloatFrequencyData(...)
Fourier Transform (aka Fast Fourier Transform/FFT)
Another way to think about "time domain vs frequency" is shown the diagram below, from Fast Fourier Transform wikipedia
getFloatTimeDomainData gives you the chart on on the top (x-axis is Time)
getFloatFrequencyData gives you the chart on the bottom (x-axis is Frequency)
a Fast Fourier Transform (FFT) converts the Time Domain data into Frequency data, in other words, FFT converts the first chart to the second chart.
cwilso has it backwards.
the time data array is the longer one (fftSize), and the frequency data array is the shorter one (half that, frequencyBinCount).
fftSize of 2048 at the usual sample rate of 44.1kHz means each sample has 1/44100 duration, you have 2048 samples at hand, and thus are covering a duration of 2048/44100 seconds, which 46 milliseconds, not 23 milliseconds. The frequencyBinCount is indeed 1024, but that refers to the frequency domain (as the name suggests), not the time domain, and it the computation 1024/44100, in this context, is about as meaningful as adding your birth date to the fftSize.
A little math illustrating what's happening: Fourier transform is a 'vector space isomorphism', that is, a mapping going bijectively (i.e., reversible) between 2 vector spaces of the same dimension; the 'time domain' and the 'frequency domain.' The vector space dimension we have here (in both cases) is fftSize.
So where does the 'half' come from? The frequency domain coefficients 'count double'. Either because they 'actually are' complex numbers, or because you have the 'sin' and the 'cos' flavor. Or, because you have a 'magnitude' and a 'phase', which you'll understand if you know how complex numbers work. (Those are 3 ways to say the same in a different jargon, so to speak.)
I don't know why the API only gives us half of the relevant numbers when it comes to frequency - I can only guess. And my guess is that those are the 'magnitude' numbers, and the 'phase' numbers are thrown out. The reason that this is my guess is that in applications, magnitude is far more important than phase. Still, I'm quite surprised that the API throws out information, and I'd be glad if some expert who actually knows (and isn't guessing) can confirm that it's indeed the magnitude. Or - even better (I love to learn) - correct me.

the Frequencies from the FFT is showing values that it shouldnt

I'm developing a software to input a monotonic .wav clip (piano) and show the piano notes which are played in that clip. I'm using FFT to calculate the frequencies but they are giving me values such as 22360 Hz and so on where I want to get around 260 to 600 Hz.
Can someone please help me with this?
Pianos put out a lot of powerful high harmonics or overtones, and thus an FFT should show amplitude in many high frequency bins. Perhaps you should use a pitch detection or estimation algorithm instead of just an FFT?
I think your problem is that you don't have enough samples, so the frequency resolution is poor. All you need to do is to have more samples or just zero-padding. See here and here. That may help.
Hotpaw2 makes an important point about overtone content.
However another thing you will require is a window function to prevent frequency domain artifacts of the sampling interval from contaminating your result. The window function applied to the data before the FFT essentially fades the signal in and out smoothly to avoid this.

Compressing a binary matrix

We were asked to find a way to compress a square binary matrix as much as possible, and if possible, to add redundancy bits to check and maybe correct errors.
The redundancy thing is easy to implement in my opinion. The complicated part is compressing the matrix. I thought about using run-length after reshaping the matrix to a vector because there will be more zeros than ones, but I only achieved a 40bits compression (we are working on small sizes) although I thought it'd be better.
Also, after run-length an idea was Huffman coding the matrix, but a dictionary must be sent in order to recover the original information.
I'd like to know what would be the best way to compress a binary matrix?
After reading some comments, yes #Adam you're right, the 14x14 matrix should be compressed in 128bits, so if I only use the coordinates (rows&cols) for each non-zero element, still it would be 160bits (since there are twenty ones). I'm not looking for an exact solution but for a useful idea.
You can only talk about compressing something if you have a distribution and a representation. That's the issue of the dictionary you have to send along: you always need some sort of dictionary of protocol to uncompress something. It just so happens that things like .zip and .mpeg already have those dictionaries/codecs. Even something as simple as Huffman-encoding is an algorithm; on the other side of the communication channel (you can think of compression as communication), the other person already has a bit of code (the dictionary) to perform the Huffman decompression scheme.
Thus you cannot even begin to talk about compressing something without first thinking "what kinds of matrices do I expect to see?", "is the data truly random, or is there order?", and if so "how can I represent the matrices to take advantage of order in the data?".
You cannot compress some matrices without increasing the size of other objects (by at least 1 bit). This is bad news if all matrices are equally probable, and you care equally about them all.
Addenda:
The answer to use sparse matrix machinery is not necessarily the right answer. The matrix could for example be represented in python as [[(r+c)%2 for c in range (cols)] for r in range(rows)] (a checkerboard pattern), and a sparse matrix wouldn't compress it at all, but the Kolmogorov complexity of the matrix is the above program's length.
Well, I know every matrix will have the same number of ones, so this is kind of deterministic. The only think I don't know is where the 1's will be. Also, if I transmit the matrix with a dictionary and there are burst errors, maybe the dictionary gets affected so... wouldnt be the resulting information corrupted? That's why I was trying to use lossless data compression such as run-length, the decoder just doesnt need a dictionary. --original poster
How many 1s does the matrix have as a fraction of its size, and what is its size (NxN -- what is N)?
Furthermore, this is an incorrect assertion and should not be used as a reason to desire run-length encoding (which still requires a program); when you transmit data over a channel, you can always add error-correction to this data. "Data" is just a blob of bits. You can transmit both the data and any required dictionaries over the channel. The error-correcting machinery does not care at all what the bits you transmit are for.
Addendum 2:
There are (14*14) choose 20 possible arrangements, which I assume are randomly chosen. If this number was larger than 128^2 what you're trying to do would be impossible. Fortunately log_2((14*14) choose 20) ~= 90bits < 128bits so it's possible.
The simple solution of writing down 20 numbers like 32,2,67,175,52,...,168 won't work because log_2(14*14)*20 ~= 153bits > 128bits. This would be equivalent to run-length encoding. We want to do something like this but we are on a very strict budget and cannot afford to be "wasteful" with bits.
Because you care about each possibility equally, your "dictionary"/"program" will simulate a giant lookup table. Matlab's sparse matrix implementation may work but is not guaranteed to work and is thus not a correct solution.
If you can create a bijection between the number range [0,2^128) and subsets of size 20, you're good to go. This corresponds to enumerating ways to descend the pyramid in http://en.wikipedia.org/wiki/Binomial_coefficient to the 20th element of row 196. This is the same as enumerating all "k-combinations". See http://en.wikipedia.org/wiki/Combination#Enumerating_k-combinations
Fortunately I know that Mathematica and Sage and other CAS software can apparently generate the "5th" or "12th" or arbitrarily numbered k-subset. Looking through their documentation, we come upon a function called "rank", e.g. http://www.sagemath.org/doc/reference/sage/combinat/subset.html
So then we do some more searching, and come across some arcane Fortran code like http://people.sc.fsu.edu/~jburkardt/m_src/subset/ksub_rank.m and http://people.sc.fsu.edu/~jburkardt/m_src/subset/ksub_unrank.m
We could reverse-engineer it, but it's kind of dense. But now we have enough information to search for k-subset rank unrank, which leads us to http://www.site.uottawa.ca/~lucia/courses/5165-09/GenCombObj.pdf -- see the section
"Generating k-subsets (of an n-set): Lexicographical
Ordering" and the rank and unrank algorithms on the next few pages.
In order to achieve the exact theoretically optimal compression, in the case of a uniformly random distribution of 1s, we must thus use this technique to biject our matrices to our output number of range <2^128. It just so happens that combinations have a natural ordering, known as ranking and unranking of combinations. You assign a number to each combination (ranking), and if you know the number you automatically know the combination (unranking). Googling k-subset rank unrank will probably yield other algorithms.
Thus your solution would look like this:
serialize the matrix into a list
e.g. [[0,0,1][0,1,1][1,0,0]] -> [0,0,1,0,1,1,1,0,0]
take the indices of the 1s:
e.g. [0,0,1,0,1,1,1,0,0] -> [3,5,6,7]
1 2 3 4 5 6 7 8 9 a k=4-subset of an n=9 set
take the rank
e.g. compressed = rank([3,5,6,7], n=9)
compressed==412 (or something, I made that up)
you're done!
e.g. 412 -binary-> 110011100 (at most n=9bits, less than 2^n=2^9=512)
to uncompress, unrank it
I'll get to 128 bits in a sec, first here's how you fit a 14x14 boolean matrix with exactly 20 nonzeros into 136 bits. It's based on the CSC sparse matrix format.
You have an array c with 14 4-bit counters that tell you how many nonzeros are in each column.
You have another array r with 20 4-bit row indices.
56 bits (c) + 80 bits (r) = 136 bits.
Let's squeeze 8 bits out of c:
Instead of 4-bit counters, use 2-bit. c is now 2*14 = 28 bits, but can't support more than 3 nonzeros per column. This leaves us with 128-80-28 = 20 bits. Use that space for array a4c with 5 4-bit elements that "add 4 to an element of c" specified by the 4-bit element. So, if a4c={2,2,10,15, 15} that means c[2] += 4; c[2] += 4 (again); c[10] += 4;.
The "most wasteful" distribution of nonzeros is one where the column count will require an add-4 to support 1 extra nonzero: so 5 columns with 4 nonzeros each. Luckily we have exactly 5 add-4s available.
Total space = 28 bits (c) + 20 bits
(a4c) + 80 bits (r) = 128 bits.
Your input is a perfect candidate for a sparse matrix. You said you're using Matlab, so you already have a good sparse matrix built for you.
spm = sparse(dense_matrix)
Matlab's sparse matrix implementation uses Compressed Sparse Columns, which has memory usage on the order of 2*(# of nonzeros) + (# of columns), which should be pretty good in your case of 20 nonzeros and 14 columns. Storing 20 values sure is better than storing 196...
Also remember that all matrices in Matlab are going to be composed of doubles. Just because your matrix can be stored as a 1-bit boolean doesn't mean Matlab won't stick it into a 64-bit floating point value... If you do need it as a boolean you're going to have to make your own type in C and use .mex files to interface with Matlab.
After thinking about this again, if all your matrices are going to be this small and they're all binary, then just store them as a binary vector (bitmask). Going off your 14x14 example, that requires 196 bits or 25 bytes (plus n, m if your dimensions are not constant). That same vector in Matlab would use 64 bits per element, or 1568 bytes. So storing the matrix as a bitmask takes as much space as 4 elements of the original matrix in Matlab, for a compression ratio of 62x.
Unfortunately I don't know if Matlab supports bitmasks natively or if you have to resort to .mex files. If you do get into C++ you can use STL's vector<bool> which implements a bitmask for you.

Why is the knapsack problem pseudo-polynomial?

I know that Knapsack is NP-complete while it can be solved by DP. They say that the DP solution is pseudo-polynomial, since it is exponential in the "length of input" (i.e. the numbers of bits required to encode the input). Unfortunately I did not get it. Can anybody explain that pseudo-polynomial thing to me slowly ?
The running time is O(NW) for an unbounded knapsack problem with N items and knapsack of size W. W is not polynomial in the length of the input though, which is what makes it pseudo-polynomial.
Consider W = 1,000,000,000,000. It only takes 40 bits to represent this number, so input size = 40, but the computational runtime uses the factor 1,000,000,000,000 which is O(240).
So the runtime is more accurately said to be O(N.2bits in W), which is exponential.
Also see:
How to understand the knapsack problem is NP-complete?
The NP-Completeness of Knapsack
Complexity of dynamic programming algorithm for the 0-1 knapsack problem
Pseudo-polynomial time
In most of our problems, we're dealing with large lists of numbers which fit comfortably inside standard int/float data types. Because of the way most processors are built to handle 4-8 byte numbers at a time at no additional cost (relative to numbers than fit in, say, 1 byte), we rarely encounter a change in running time from scaling our numbers up or down within ranges we encounter in real problems - so the dominant factor remains just the sheer quantity of data points, the n or m factors that we're used to.
(You can imagine that the Big-O notation is hiding a constant factor that divides-out 32 or 64 bits-per-datum, leaving only the number-of-data-points whenever each of our numbers fit in that many bits or less)
But try reworking with other algorithms to act on data sets involving big ints - numbers that require more than 8 bytes to represent - and see what that does to the runtime. The magnitude of the numbers involved always makes a difference, even in the other algorithms like binary sort, once you expand beyond the buffer of safety conventional processors give us "for-free" by handling 4-8 byte batches.
The trick with the Knapsack algorithm that we discussed is that it's unusually sensitive (relative to other algorithms ) to the magnitude of a particular parameter, W. Add one bit to W and you double the running time of the algorithm. We haven't seen that kind of dramatic response to changes in value in other algorithms before this one, which is why it might seem like we're treating Knapsack differently - but that's a genuine analysis of how it responds in a non-polynomial fashion to changes in input size.
The way I understand this is that the capacity would've been O(W) if the capacity input were an array of [1,2,...,W], which has a size of W. But the capacity input is not an array of numbers, it's instead a single integer. The time complexity is about the relationship to the size of input. The size of an integer is NOT the value of the integer, but the number of bits representing it. We do later convert this integer W into an array [1,2,...,W] in the algorithm, leading people into mistakenly thinking W is the size, but this array is not the input, the integer itself is.
Think of input as "an array of stuff", and the size as "how many stuff in the array". The item input is actually an array of n items in the array so size=n. The capacity input is NOT an array of W numbers in it, but a single integer, represented by an array of log(W) bits. Increase the size of it by 1 (adding 1 meaningful bit), W doubles so run time doubles, hence the exponential time complexity.
The Knapsack algorithm's run-time is bound not only on the size of the input (n - the number of items) but also on the magnitude of the input (W - the knapsack capacity) O(nW) which is exponential in how it is represented in computer in binary (2^n) .The computational complexity (i.e how processing is done inside a computer through bits) is only concerned with the size of the inputs, not their magnitudes/values.
Disregard the value/weight list for a moment. Let's say we have an instance with knapsack capacity 2. W would take two bits in the input data. Now we shall increase the knapsack capacity to 4, keeping the rest of the input. Our input has only grown by one bit, but the computational complexity has increased twofold. If we increase the capacity to 1024, we would have just 10 bits of the input for W instead of 2, but the complexity has increased by a factor of 512. Time complexity grows exponentially in the size of W in binary (or decimal) representation.
Another simple example that helped me understand the pseudo-polynomial concept is the naive primality testing algorithm. For a given number n we are checking if it's divided evenly by each integer number in range 2..√n, so the algorithm takes √(n−1) steps. But here, n is the magnitude of the input, not it's size.
Now The regular O(n) case
By contrast, searching an array for a given element runs in polynomial time: O(n). It takes at most n steps and here n is the size of the input (the length of the array).
[ see here ]
Calculating bits required to store decimal number
Complexity is based on input. In knapsack problem, Inputs are size, max Capacity, and profit, weight arrays. We construct dp table as size * W so we feel as its of polynomial time complexity. But, input W is an integer, not an array. So, it will be O(size*(no Of bits required to store given W)). If no of bits increase by 1, then running time doubles. Thus it is exponential, thereby pseudo-polynomial.