About system data rate in H264 - h.264

I have a engine to check the H264 video is compliance with AVCHD or BDMV spec, the SPEC mentions the MAX system data rate is up to 24 Mbit/s, I want to know how to calculate the system data rate? Does it mean the average of whole file? Or does it mean the average of 1 second?

The maximum specifies that you that you will never exceed 24Mbps so you will never send more than one bit in any 42nS (approximately) period. You can scale that to any time frame you want by simple multiplication to the point when you will never burst beyond 24M bits in one second (and you will still never send more than one bit in any of the 24M 42nS periods that make up that second).
When you calculate an average for any time period, it MUST be below the specified maximum burst, but is simply considered an average. Those of us in the CATV industry spend a lot of time trying to make the transmission system behave as if the average rate is a constant rate, because if you have a certain throughput (in bits) for video, you don't want to waste any of it. We "rate shape" the video as well as using adaptive buffering in the digital set-top boxes that receive the signal.
A single QAM256 channel on the U.S. broadband cable system will support 40Mbps and usually between 10 and 12 normal definition signals with an average bit rate of approximately 4Mbps. These channels will burst to 9Mbps when there is a lot of change in the picture of an unpredictable nature. As you can imagine, a boxing match (with a lot of movement) takes significantly more bandwidth, than a network news anchor reading from their desk, so we also try to match channels to fill this available bandwidth.
Typically, we can only fit 3 high-definition channels in the same 40Mbps channel and these have an average bit rate of about 12.5Mbps and as you've noted above, are limited to 24Mbps.
Hope this helps!

Related

Designing a Kalaman filter

I am working on a project to monitor a water tank for energy storage. I have a large water tank, to which 10 thermometers are attached. By measuring the temperature, I can estimate the energy stored in the tank. That is quite simple, but I want to add another feature.
By sampling the amount of energy in time, I want to determine the power that is currently flowing in or out of the tank. I am measuring the energy in the tank every minute and determining the current power by comparing it to the last measurement. The direct result is very noisy (for example jumping between 2-4 kW), so I need some kind of filtering.
I started simply with averaging, which works fine,(current power is the average from last 10 measurements) but I wanted to try something a bit fancier. So I wrote a simple Kalman filter, the one described in this video - https://www.youtube.com/watch?v=PZrFFg5_Sd0&list=PLX2gX-ftPVXU3oUFNATxGXY90AULiqnWT&index=5 The problem is, I guess, that this filter is designed for static measurements, and my measurement changes quite a bit. As an example, in this plot, there are 60 measurements (1 every minute) and the line is the output of the Kalman filter:
Image
Is it possible to modify the filter to follow the actual value more accurately? Or is this just not a suitable system to filter with this kind of filter and I should use something else?

The drawbacks of Short time fourier transform

I have read the following sentence in a documents talk about the drawback of short time Fourier transform, and he is said :
the drawback is that once you choose a particular size for the time
window, that window is the same for all frequencies
So what is the relation between frequencies and the size of window. If we have a high frequency component in a part of a signal how will not be able to detect this frequency if the size of the window is not smaller/bigger enough?
Furthermore, he is said about wavelet transform :
Wavelet analysis allows the use of long time intervals where we want
more precise low-frequency information, and shorter regions where we
want high-frequency information
I feel that the answer has a relation with nyquest rate somehow
For sampled data, the number of orthogonal sinusoidal FT basis vectors below half the sample rate increases with the length of the STFT window, and the bandwidth of each DFT/FFT result bin for each basis vector decreases. If the window is too short, then each DFT result might detect not only your high frequency component of interest, but a greater bandwidth of adjacent frequencies.

How to find all frequencies in audio with discrete fourier transform?

I want to analyze some audio and decompose it as best as I can into sine waves. I have never used FFT before and am just doing some initial reading and about the concepts and available libraries, like FFTW and KissFFT.
I'm confused on this point... it sounds like the DFT/FFT will give you the sine amplitudes only at certain frequencies, multiples of a base frequency. For example, if I have audio sampled at the usual 44100 Hz, and I pick a chunk of say 256 samples, then that chuck could fit one cycle of 44100/256=172Hz, and the DFT will give me the sine amplitudes at 172, 172*2, 172*3, etc. Is that correct? How do you then find the strength at other frequencies? I'd like to see a spectrum all the way from 20Hz to about 15Khz, at about 1Hz increments.
Fourier decomposition allows you to take any function of time and describe it as a sum of sine waves each with different amplitudes and frequencies. If however you want to approach this problem using the DFT, you need to make sure you have sufficient resolution in the frequency domain in order to distinguish between different frequencies. Once you have that you can determine which frequencies are dominant in the signal and create a signal consisting of multiples sinewaves corresponding to those frequencies. You are correct in saying that with a sampling frequency of 44.1 kHz, only looking at 256 samples, the lowest frequency you will be able to detect in those 256 samples is a frequency of 172 Hz.
OBTAIN SUFFICIENT RESOLUTION IN THE FREQUENCY DOMAIN:
Amplitude values for frequencies "only at certain frequencies, multiples of a base frequency", is true for Fourier decomposition, NOT the DFT, which will have a frequency resolution of a certain increment. The frequency resolution of the DFT is related to the sampling rate and number of samples of the time-domain signal used to calculate the DFT. Reducing the frequency spacing will give you a better ability to distinguish between two frequencies close together and this can be done in two ways;
Decreasing the sampling rate, but this would move the periodic repetitions in frequency closer together. (Remember NyQuist theorem here)
Increase the number of samples which you use to calculate the DFT. If only the 256 samples are available, one can perform "zero padding" where 0-valued samples are appended to the end of the data, but there are some effects to this which needs to be considered.
HOW TO COME TO A CONCLUSION:
If you depict the frequency content of different audio signals into individual graphs, you will find that the amplitudes differ abit. This is because the individual signals will not be identical in sound, and there is always noise inherent in any signal (from the surroundings and the hardware itself). Therefore, what you want to do is to take the average of two or more DFT signals to remove noise and get a more accurate represention of the frequency content. Depending on your application, this may not be possible if the sound you are capturing is noticably changing rapidly over time (for example speech, or music). Averaging is thus only useful if all the signals to be averaged are pretty much equal in sound (individual seperate recordings of "the same thing"). Just to clarify, from, for example, four time-domain signals, you want to create four frequency domain signals (using a DFT method), and then calculate the average of the four frequency-domain signals into a single averaged frequency-domain signal. This will remove noise and give you a better representation of which frequencies are inherent in your audio.
AN ALTERNATIVE SOLUTION:
If you know that your signal is supposed to contain a certain number of dominant frequencies (not too many) and these are the only ones your are interesting in, then I would recommend that you use Pisarenko's harmonic decomposition (PHD) or Multiple signal classification (MUSIC, nice abbreviation!) to find these frequencies (and their corresponding amplitude values). This is less intensive computationally than the DFT. For example. if you KNOW the signal contains 3 dominant frequencies, Pisarenko will return the frequency values for these three, but keep in mind that the DFT reveals much more information, allowing you come to more conclusions.
Your initial assumption is incorrect. An FFT/DFT will not give you amplitudes only at certain discrete frequencies. Those discrete frequencies are only the centers of bins, each bin constituting a narrow-band filter with a main lobe of non-zero bandwidth, roughly a width or two of the FFT bin separation, depending on the window (rectangular, von Hann, etc.) applied before the FFT. Thus the amplitude of spectral content between bin centers will show up, but spread across multiple FFT result bins.
If the separation of key signals is large enough and the noise level is low enough, then you can interpolate the FFT results to examine frequencies between bin centers. You may need to use a high quality interpolator, such as a Sinc kernel.
If your signal separation is smaller or the noise level is higher, then you may need a longer window of data to feed a longer FFT to gather sufficient resolution information. An FFT window of length 256 at 44.1k sample rate is almost certainly just too short to gather sufficient information regarding spectral content below a few 100 Hz, if those are among the frequencies you would like to see examined, as they can't be separated cleanly from a DC bias (bin 0).
Unfortunately, there's a degree of uncertainty in identifying the frequencies in a fixed sample of a signal. If you use a short FFT, then there's no way to tell the difference between frequencies over a fairly wide range. If you use a long FFT to get higher resolution in the frequency domain, then you can't detect frequency changes as quickly. This is inherent in the math.
Off the top of my head: If you want a 15kHz range at 1Hz increments, you need a 15000 point FFT, which at 44.1kHz means you'll get a frequency plot three times per second. (I may be missing a factor of 2 in there as I can't recall whether the Nyquist limit means you actually want a 30kHz bandwidth.)
You may also be interested in the Short-time Fourier transform. It doesn't solve the fundamental trade-off problem but in practice may get you what you want.

Microphone Pitch/Freq Detection (actionscript 3.0 in particular)

So, I'm trying to detect the average frequency of a sound recorded from the microphone. It can be assumed that this sound will be in mp3 or wav form. My final goal is to do this live (or close enough), but for now simply finding the average frequency of an mp3 or wav is good enough to start with.
I'm having an unbelievably hard time finding any classes in actionscript 3.0 that can help me with this task. Can anyone help me out by possibly suggesting classes in AS3.0 or algorithms for me to look at for this particular task ?
Thanks to all in advance.
The best approach varies depending on the audio you're analyzing. For monophonic input, such as a singer, flute, or trumpet, an autocorrelation approach can work well. The idea here is that you're trying to find the period of the wave form by comparing it against itself at various intervals to find the best match. Imagine you have a sound wave with a period that starts over after every 400 samples. If you were to iterate over some number of samples, always comparing the sample at index i with the sample at index (i + 400), perhaps by subtracting one from another and adding this result to a running total, you'd find that your total would be 0 if the wave was a perfect match of itself at this interval of 400. Of course, you don't know that 400 is your magic number, and so you need to check a variety of intervals that fall within your possible range. You could exclude intervals that would result in a frequency that's impossibly low or high. You also would obviously not expect to find a perfect match, but generally speaking, the interval where the match is closest is your frequency for a monophonic pitch.
For polyphonic sources, or instruments with a timbre that's very rich in harmonics like violin or guitar, you may need to use a different approach. FFT based approaches are widely used for this in order to break down audio segments into their harmonic pitch content. It's then a matter of applying some rules that you come up with for deciding which frequencies coming out of the FFT are your best bet.

Computing estimated times of file copies / movements?

Inspired by this xckd cartoon I wondered exactly what is the best mechanism to provide an estimate to the user of a file copy / movement?
The alt tag on xkcd reads as follows:
They could say "the connection is probably lost," but it's more fun to do naive time-averaging to give you hope that if you wait around for 1,163 hours, it will finally finish.
Ignoring the funny, is that really how it's done in Windows? How about other OS? Is there a better way?
Have a look at my answer to a similar question (and the other answers there) on how the remaining time is estimated in Windows Explorer.
In my opinion, there is only one way to get good estimates:
Calculate the exact number of bytes to be copied before you begin the copy process
Recalculate you estimate regularly (every 1, 5 or 10 seconds, YMMV) based on the current transfer speed
The current transfer speed can fluctuate heavily when you are copying on a network, so use an average, for example based on the amount of bytes transfered since your last estimate.
Note that the first point may require quite some work, if you are copying many files. That is probably why the guys from Microsoft decided to go without it. You need to decide yourself if the additional overhead created by that calculation is worth giving your user a better estimate.
I've done something similar to estimate when a queue will be empty, given that items are being dequeued faster than they are being enqueued. I used linear regression over the most recent N readings of (time,queue size).
This gives better results than a naive
(bytes_copied_so_far / elapsed_time) * bytes_left_to_copy
Start a global timer that fires say, every 1000 milliseconds and update a total elpased time counter. Let's call this variable "elapsedTime"
While the file is being copied, update some local variable with the amount already copied. Let's call this variable "totalCopied"
In the timer event that is periodically raised, divide totalCopied by totalElapsed to give the number of bytes copied per timer interval (in this case, 1000ms). Let's call this variable "bytesPerSec"
Divide the total file size by bytesPerSec and obtain the total number of seconds theoretically required to copy this file. Let's call this variable remainingTime
Subtract elapsedTime from remainingTime and you a somewhat accurate calculation for file copy time.
I think dialogs should just admit their limitations. It's not annoying because it's failing to give a useful time estimate, it's annoying because it's authoritatively offering an estimate that's obvious nonsense.
So, estimate however you like, based on current rate or average rate so far, rolling averages discarding outliers, or whatever. Depends on the operation and the typical durations of events which delay it, so you might have different algorithms when you know the file copy involves a network drive. But until your estimate has been fairly consistent for a period of time equal to the lesser of 30 seconds or 10% of the estimated time, display "oh dear, there seems to be some kind of holdup" when it's massively slowed, or just ignore it if it's massively sped up.
For example, dialog messages taken at 1-second intervals when a connection briefly stalls:
remaining: 60 seconds // estimate is 60 seconds
remaining: 59 seconds // estimate is 59 seconds
remaining: delayed [was 59 seconds] // estimate is 12 hours
remaining: delayed [was 59 seconds] // estimate is infinity
remaining: delayed [was 59 seconds] // got data: estimate is 59 seconds
// six seconds later
remaining: 53 seconds // estimate is 53 seconds
Most of all I would never display seconds (only hours and minutes). I think it's really frustrating when you sit there and wait for a minute while the timer jumps between 10 and 20 seconds. And always display real information like: xxx/yyyy MB copied.
I would also include something like this:
if timeLeft > 5h --> Inform user that this might not work properly
if timeLeft > 10h --> Inform user that there might be better ways to move the file
if timeLeft > 24h --> Abort and check for problems
I would also inform the user if the estimated time varies too much
And if it's not too complicated, there should be an auto-check function that checks if the process is still alive and working properly every 1-10 minutes (depending on the application).
speaking about network file copy, the best thing is to calculate file size to be transfered, network response and etc. An approach that i used once was:
Connection speed = Ping and calculate the round trip time for packages with 15 Kbytes.
Get my file size and see, theorically, how many time it would take if i would break it in
15 kb packages using my connection speed.
Recalculate my connection speed after transfer is started and ajust the time that will be spended.
I've been pondering on this one myself. I have a copy routine - via a Windows Explorer style interface - which allows the transfer of selected files from an Android Device, to a PC.
At the start, I know the total size of the file(s) that are to be copied, and as I am using C#.NET, I am using a Stopwatch, to get the elapsed time, and while the copy is in progress, I am keeping a total of what is copied so far, in terms of bytes.
I haven't actually tested it yet, but the best way seems to be this -
estimated = elapsed * ((totalSize - copiedSoFar) / copiedSoFar)
I never saw it the way you guys are explaining it-by trasfeed bytes & total bytes.
The "experience" always made a lot more sense (not good/accurate) if you instead use bytes of each file, and file count. This is how the estimate swings wildly.
If you are transferring large files first, the estimate goes long-even with the connection static. It is like it naively thinks that all files are the average size of those thus transferred, and then makes a guess assuming that the average file size will remain accurate for the entire time.
This, and the other ways, all get worse when the connection 'speed' varies...