Encoding / decoding performance estimation question of HDVICP2 of TI DM385

Encoding / decoding performance estimation question of HDVICP2 of TI DM385 - h.264

I am estimating the encoding and decoding performance of HDVICP2 of TI DM385. The SOC is very very old so FAEs just ignored me.
My question is: How to calculate encoding / decoding performance of HDVICP2 encoder / decoder correctly?
For example:
According to "Table 2 Cycle Information" of datasheet "H264 High Profile Decoder 2.0 on HDVICP2 and Media Controller Based Platform Data Sheet (SPRS839)", I can find the average and peak cycles per second of test file "station_p1920x1080_7mbps_IPB_30fps.264" are 155.57 and 160.63 separately.
If every operation of HDVICP2 takes only one cycle, I can simply say that a 1080p60 video with the same characteristic will take about 2 times cycles per second so the decoder is able to deal with only one 1080p60 video but unable to deal with more 1080p60 video at the same time. (If working frequency is 533MHZ, 155.57*2=311 and smaller than 533)
Is above hypothesis correct? If no, please answer me why. Thanks very much.

Related

Anomaly Detection with Autoencoder using unlabelled Dataset (How to construct the input data)

I am new in deep learning field, i would like to ask about unlabeled dataset for Anomaly Detection using Autoencoder. my confusing part start at a few questions below:
1) some post are saying separated anomaly and non-anomaly (assume is labelled) from the original dataset, and train AE with the only non-anomaly dataset (usually amount of non-anomaly will be more dominant). So, the question is how am I gonna separate my dataset if it is unlabeled?
2) if I train using the original unlabeled dataset, how to detect the anomaly data?

Label of data doesn't go into autoencoder.
Auto Encoder consists of two parts
Encoder and Decoder
Encoder: It encodes the input data, say a sample with 784 features to 50 features
Decoder: from those 50 features it converts it back to original feature i.e 784 features.
Now to detect anomaly,
if you pass an unknown sample, it should be converted back to its original sample without much loss.
But if there is a lot of error in converting it back. then it could be an anomaly.
Picture Credit: towardsdatascience.com

I think you answered the question already yourself in part: The definition of an anomaly is that it should be considered "a rare event". So even if you don't know the labels, your training data will contain only very few such samples and predominantly learn on what the data usually looks like. So both during training as well as at prediction time, your error will be large for an anomaly. But since such examples should come up only very seldom, this will not influence your embedding much.
In the end, if you can really justify that the anomaly you are checking for is rare, you might not need much pre-processing or labelling. If it occurs more often (a threshold is hard to give for that, but I'd say it should be <<1%), your AE might pick up on that signal and you would really have to get the labels in order to split the data... . But then again: This would not be an anomaly any more, right? Then you could go ahead and train a (balanced) classifier with this data.

understanding getByteTimeDomainData and getByteFrequencyData in web audio

The documentation for both of these methods are both very generic wherever I look. I would like to know what exactly I'm looking at with the returned arrays I'm getting from each method.
For getByteTimeDomainData, what time period is covered with each pass? I believe most oscopes cover a 32 millisecond span for each pass. Is that what is covered here as well? For the actual element values themselves, the range seems to be 0 - 255. Is this equivalent to -1 - +1 volts?
For getByteFrequencyData the frequencies covered is based on the sampling rate, so each index is an actual frequency, but what about the actual element values themselves? Is there a dB range that is equivalent to the values returned in the returned array?

getByteTimeDomainData (and the newer getFloatTimeDomainData) return an array of the size you requested - its frequencyBinCount, which is calculated as half of the requested fftSize. That array is, of course, at the current sampleRate exposed on the AudioContext, so if it's the default 2048 fftSize, frequencyBinCount will be 1024, and if your device is running at 44.1kHz, that will equate to around 23ms of data.
The byte values do range between 0-255, and yes, that maps to -1 to +1, so 128 is zero. (It's not volts, but full-range unitless values.)
If you use getFloatFrequencyData, the values returned are in dB; if you use the Byte version, the values are mapped based on minDecibels/maxDecibels (see the minDecibels/maxDecibels description).

Mozilla 's documentation describes the difference between getFloatTimeDomainData and getFloatFrequencyData, which I summarize below. Mozilla docs reference the Web Audio
experiment ; the voice-change-o-matic. The voice-change-o-matic illustrates the conceptual difference to me (it only works in my Firefox browser; it does not work in my Chrome browser).
TimeDomain/getFloatTimeDomainData
TimeDomain functions are over some span of time.
We often visualize TimeDomain data using oscilloscopes.
In other words:
we visualize TimeDomain data with a line chart,
where the x-axis (aka the "original domain") is time
and the y axis is a measure of a signal (aka the "amplitude").
Change the voice-change-o-matic "visualizer setting" to Sinewave to
see getFloatTimeDomainData(...)
Frequency/getFloatFrequencyData
Frequency functions (GetByteFrequencyData) are at a point in time; i.e. right now; "the current frequency data"
We sometimes see these in mp3 players/ "winamp bargraph style" music players (aka "equalizer" visualizations).
In other words:
we visualize Frequency data with a bar graph
where the x-axis (aka "domain") are frequencies or frequency bands
and the y-axis is the strength of each frequency band
Change the voice-change-o-matic "visualizer setting" to Frequency bars to see getFloatFrequencyData(...)
Fourier Transform (aka Fast Fourier Transform/FFT)
Another way to think about "time domain vs frequency" is shown the diagram below, from Fast Fourier Transform wikipedia
getFloatTimeDomainData gives you the chart on on the top (x-axis is Time)
getFloatFrequencyData gives you the chart on the bottom (x-axis is Frequency)
a Fast Fourier Transform (FFT) converts the Time Domain data into Frequency data, in other words, FFT converts the first chart to the second chart.

cwilso has it backwards.
the time data array is the longer one (fftSize), and the frequency data array is the shorter one (half that, frequencyBinCount).
fftSize of 2048 at the usual sample rate of 44.1kHz means each sample has 1/44100 duration, you have 2048 samples at hand, and thus are covering a duration of 2048/44100 seconds, which 46 milliseconds, not 23 milliseconds. The frequencyBinCount is indeed 1024, but that refers to the frequency domain (as the name suggests), not the time domain, and it the computation 1024/44100, in this context, is about as meaningful as adding your birth date to the fftSize.
A little math illustrating what's happening: Fourier transform is a 'vector space isomorphism', that is, a mapping going bijectively (i.e., reversible) between 2 vector spaces of the same dimension; the 'time domain' and the 'frequency domain.' The vector space dimension we have here (in both cases) is fftSize.
So where does the 'half' come from? The frequency domain coefficients 'count double'. Either because they 'actually are' complex numbers, or because you have the 'sin' and the 'cos' flavor. Or, because you have a 'magnitude' and a 'phase', which you'll understand if you know how complex numbers work. (Those are 3 ways to say the same in a different jargon, so to speak.)
I don't know why the API only gives us half of the relevant numbers when it comes to frequency - I can only guess. And my guess is that those are the 'magnitude' numbers, and the 'phase' numbers are thrown out. The reason that this is my guess is that in applications, magnitude is far more important than phase. Still, I'm quite surprised that the API throws out information, and I'd be glad if some expert who actually knows (and isn't guessing) can confirm that it's indeed the magnitude. Or - even better (I love to learn) - correct me.

Time stamp and composition time offset

Does H.264 buffer contains Time stamp and decoding time stamp information.
when we get the H.264 nalu data does that contain timing information in it?

If you mean raw H.264 NAL units than no they don't contain timing information if mean something like PTS/DTS. Timestamps are on higher level in containers like MKV/MP4/TS. The only time related information in H.264 specs afaik are num_units_in_tick/time_scale in VUI that can be used to finding FPS in case of constant frame rate (fixed_frame_rate_flag = 1), and some fields in Picture timing SEI but as they are optional and not really well specified so nobody really use them.

x264 rate control modes

Recently I am reading the x264 source codes. Mostly, I concern the RC part. And I am confused about the parameters --bitrate and --vbv-maxrate. When bitrate is set, the CBR mode is used in frame level. If you want to start the MB level RC, the parameters bitrate, vbv-maxrate and vbv-bufsize should be set. But I don't know the relationship between bitrate and vbv-maxrate. What is the criterion of the real encoding result when bitrate and vbv-maxrate are both set?
And what is the recommended value for bitrate? Equals to vbv-maxrate?
Also what is the recommended value for vbv-bufsize? Half of vbv-maxrate?
Please give me some advice.

bitrate address the "target filesize" when you are doing encoding. It is understandably confusing because it applies a "budget" of certain size and then tries to apportion this budget on the frames - that is why the later parts of a movie get a smaller amount of data which results in lower video quality. For example, if you have 10 seconds of complete black images followed by 10 second of natural video - the final encoded file will be very different than if the order was the opposite.
vbv-bufsize is the buffer that has to be completed before a "transmission" would occur say in a streaming scenario. Now, let's tie this to I-frames and P-frames: the vbv-bufsize will limit the size of any of your encoded video frames - most likely the I-frame.

How to use FFT for large chunks of data to plot amplitude-frequency response?

I am a programmer and not a good mathematician so FFT is like some black box to me, I would like t throw some data into some FFT library and get out a plottable AFR (amplitude-frequency response) data, like some software like Rightmark audio does:
http://www.ixbt.com/proaudio/behringer/3031a/fr-hf.png
Now I have a system which plays back a logarithmic swept sine (with short fade-in/fade-out to avoid sharp edges) and records the response from the audio system.
As far as I understand, I need to pad the input with zeros to 2^n, use audio samples as a real part of a complex numbers, set imaginary=0, and I'll get back from FFT the frequency bins array whith half length of input data.
But if I do not need as big frequency resolution as some seconds audio buffer give to me, then what is the right way to make, lets say, 1024 size FFT window, feed chunks of audio and get back 512 frequency points which take into account all the data I passed in? Or maybe it is not possible and I need to feed entire swept sine at once to get back all the AFR data I need?
Also is there any smoothing needed? I have seen that the raw output from FFT may be really noisy. What is the right way to avoid the noise as early as possible, so I see the noise only as it comes from the AFR itself and not from FFT calculations (like the image in the link I have given - it seems pretty smooth)?
I am a C++/C# programmer. I would be grateful for any examples which show how to process chunks of swept sine end get back AFR data. For now I have found only examples which process data in small chunks in realtime, and that is not what I need.

Window function should help you reducing the noise
All you need to do is multiply your input data by w(n) :

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008