How to make Media Foundation H.264 decoder work? - h.264

For some reason I'm not able to decode H.264.
The input/output configuration went well, just like input/output buffer creation.
I'm manually feeding the decoder with the H.264 demuxed from a live stream. Therefore, I use MFVideoFormat_H264_ES as media subtype. The decoding is very slow and the decoded frames are complete garbage. Other decoders are decoding the same stream properly.
Weird thing is that once ProcessInput() returns MF_E_NOTACCEPTING, the following ProcessOutput() returns MF_E_TRANSFORM_NEED_MORE_INPUT. According to MSDN, this should never happen.
Can anybody provide some concrete info on how to do it? (assuming that MF H.264 is functional, which I seriously doubt).
I'm willing to provide extra information, but I don't know what somebody might need in order to help.
Edit:
When exactly should I reset the number of bytes in input buffer to zero?
Btw, I'm resetting the output buffer when ProcessOutput() delivers something (garbage).
Edit2:
Without resetting the current buffer size on input buffer to 0, I managed to get some semi valid output. By semi valid I mean that on every successful ProcessOutput() I receive an YUV image where current image contains a few decoded macro blocks more than the previous frame. The rest of the frame is black. Because I do not reset the size, this stops after a while. So, I guess there is a problem in resetting the buffer size and I guess I should get some notification when the whole frame is done (or not).
Edit3:
While creating input buffer, GetInputStreamInfo() returns 4096 as input buffer size. Alignment 0. However, 4k is not enough. Increasing to 4MB helps in decompressing frame fragment by frame fragment. Still have to figure out if there is a way to tell when is the entire frame decoded.

When creating input buffer, GetInputStreamInfo() returns 4096 as buffer size, which is too small.
Setting input buffer to 4MB solved the problem. The buffer can probably be smaller... still have to test that.

Related

Maximum bitstream size for H.264

Although I am quite familiar with H.264 encoding I came down to a point where I need advice from more experienced people. I'm performing hardware accelerated H.264 encoding using Intel Quick Sync and NVIDIA NVENC in a unified pipeline. The issue that troubles me is bitstream output buffer size. Intel Quick Sync provides a way to query the maximum bitstream size from the encoder, while NVDIA NVENC does not have such a feature (or at least I haven't found it, pointers are welcome). In their tutorial they state that:
NVIDIA recommends setting the VBV buffer size equal to single frame size. This is very
helpful in low latency applications, where network bandwidth is a concern. Single frame
VBV allows users to enable capped frame size encoding. In single frame VBV, VBV
buffer size must be set to maximum frame size which is equal to channel bitrate divided
by frame rate. With this setting, every frame can be sent to client immediately upon
encoding and the decoder can also decode without any buffering.
For example, if you have a channel bitrate of B bits/sec and you are encoding at N fps,
the following settings are recommended to enable single frame VBV buffer size.
uint32_t maxFrameSize = B/N;
NV_ENC_RC_PARAMS::vbvBufferSize= maxFrameSize;
NV_ENC_RC_PARAMS::vbvInitialDelay= maxFrameSize;
NV_ENC_RC_PARAMS::maxBitRate= NV_ENC_CONFIG::vbvBufferSize *N; // where N is the encoding frame rate.
NV_ENC_RC_PARAMS::averageBitRate= NV_ENC_RC_PARAMS::vbvBufferSize *N; // where N is the encoding frame rate.
NV_ENC_RC_PARAMS::rateControlMode= NV_ENC_PARAMS_RC_TWOPASS_CBR;
I am allocating a bitstream buffer pool for quite many encoding sessions so having an overhead of unused memory for each buffer by calculating the size from network bandwidth (in my case it is not the bottleneck) will cause ineffective memory usage.
So the general question is - is there any way how to determine the bitstream size for H.264 assuming there is no frame buffering and each frame should generate NAL units? Can I assume that it will never be larger than the input NV12 buffer (which seems unreliable since there may be many NAL units like SPS/PPS/AUD/SEI for the first frame and I am not sure if the size of those plus the same of IDR frame is not greater than the NV12 buffer size)? Does the standard have any pointers on this? Or is it totally encoder dependent?

Is it possible to remove start codes using NVENC?

I'm using NVENC SDK to encode OpenGL frames and stream them over RTSP. NVENC gives me encoded data in the form of several NAL units. In order to stream them with Live555 I need to find the start code (0x00 0x00 0x01) and remove it. I want to avoid this operation.
NVENC has a sliceOffset attribute which I can consult, but it indicates slices, not NAL units. It only points the ending of the SPS and PPS headers, where the actual data starts. I understand that a slice is not equal to a NAL (correct me if I'm wrong). I'm already forcing single slices for encoded data.
Is any of the following possible?
Force NVENC to encode individual NAL units
Force NVENC to indicate where the NAL units in each encoded data block are
Make Live555 accept the sequence parameters for streaming
There seems to be a point where every person trying to do H.264 over RTSP/RTP comes down to this question. Well here are my two cents:
1) There is a concept of an access unit. An access unit is a set of NAL units (may be as well be only one) that represent an encoded frame. That is the level of logic you should work at. If you are saying that you want the encoder to give you individual NAL unit's, then what behavior do you expect when the encoding procedure results in multiple NAL units from one raw frame (e.g. SPS + PPS + coded picture). That being said, there are ways to configure the encoder to reduce the number of NAL units in an access unit (like not including the AUD NAL, not repeating SPS/PPS, exclude SEI NAL's) - with that knowledge you can actually know what to expect and kind of force the encoder to give you single NAL per frame (of course this will not work for all frames, but with the knowledge you have about the decoder you can handle that). I'm not an expert on the NVENC API, I've also just started using it, but at least as for Intel Quick Sync, turning off AUD,SEI and disabling repetition of PPS/SPS gave me roughly around 1 NAL per frame for frames 2...N.
2) Won't be able to answer this since as I mentioned I'm not familiar with the API but I highly doubt this.
3) SPS and PPS should be in the first access unit (the first bit-stream you get from the encoder) and you could just find the right NAL's in the bit-stream and extract them, or there may be a special API call to obtain them from the encoder.
All that being said, I don't think it is that hard to actually run through the bit-stream, parse the start codes and extract the NAL unit's and feed them to Live555 one by one. Of course, if the encoder offers to output the bit-stream in the AVCC format (compared to the start codes or Annex B it uses interleaved length value between the NAL units so you can just jump to the next one without looking for the prefix) then you should use it. When it is just RTP it's easy enough to implement the transport yourself, since I've had bad luck with GStreamer that did not have proper support for FU-A packetization, in case of RTSP the overhead of the transport infrastructure is bigger and it is reasonable to use a 3rd party library like Live555.

HLS - how to reduce delay?

Anyone know how configure the HLS media server for reduce a little bit the delay of live streaming video?
what types of parameters i need to change?
I had heard that you could do some tuning using parameters like this: HLSMediaFileDuration
Thanks in advance
A Http Live Streaming system typically has an encoder which produces segments of a certain number of seconds and a media server (web server) which serves playlists containing a list of URLs to these segments to player applications.
Media Files = Segments = .ts files = MPEG2-TS files (in HLS speak)
There are some ways to reduce the delay in HLS:
Reduce the encoded segment length from Apple's recommended 10 seconds to 5 seconds or less. Reducing segment length increases network overhead and load on the web server.
Use lower bitrates, larger .ts files take longer to upload and download. If you use multi-bitrate streams, make sure the first bitrate listed in the playlist is a little lower than the bitrate most of your users use. This will reduce the time to start playing back the stream
Get the segments from the encoder to the web server faster. Upload while still encoding if possible. Update the playlist as soon as the segment has finished uploading
Also remember that the higher the delay the better the quality of your stream (low delay = lower quality). With larger segments, there is less overhead so more space for video data. Taking a longer time to encode results in better quality. More buffering results in less chance of video streams stuttering on playback.
HLS is all about trading quality of playback for longer delay, so you will never be able to use HLS for things like video conferencing. Typical delay in HLS is 30-60 sec, minimum in practice is around 15 sec. If you want low delay use RTP for streaming, but good luck getting good quality on low or variable speed networks.
Please specify which media server you use. Generally speaking, yes - changing chunk size will definitely affect delay time. The less is the first chunk, the quicker the video will be shown in the player.
Actually, Apple recommend to divide your file to small chunks this equal length of file, integers.
In practice, there is huge difference between players. Some of them parse manifest changing this values.
Known practice is to pre-cache in memory first chunks in low & medium resolution (Or try to download them in background of app/page - Amazon does this, though their video is MSS)
I was having the same problem and the keys for me were:
Lower the segment length. I set it to 2s because I'm streaming on a local network. In other type of networks, you need to be careful with the overhead that a low segment length adds that can impact your playback quality.
In your manifest, make sure the #EXT-X-TARGETDURATION is accurate. From here:
The EXT-X-TARGETDURATION tag specifies the maximum Media Segment
duration. The EXTINF duration of each Media Segment in the Playlist
file, when rounded to the nearest integer, MUST be less than or equal
to the target duration; longer segments can trigger playback stalls
or other errors. It applies to the entire Playlist file.
For some reason, the #EXT-X-TARGETDURATION in my manifest was set to 5 and I was seeing a 16-20s delay. After changing that value to 2, which is the correct one according to my segments' length, I am now seeing delays of 6-10s.
In summary, you should expect a delay of at least 3X your #EXT-X-TARGETDURATION. So, lowering the segment length and the #EXT-X-TARGETDURATION value should help reducing the delay.

Obtain the result ByteArray of the current playing sounds

I am developing an AIR application for desktop that simulate a drum set. Pressing the keyboard will result in a corresponding drum sound played in the application. I have placed music notes in the application so the user will try to play a particular song.
Now I want to record the whole performance and export it to a video file, say flv. I have already succeed in recording the video using this encoder:
http://www.zeropointnine.com/blog/updated-flv-encoder-alchem/
However, this encoder does not have the ability to record sound automatically. I need to find a way to get the sound in ByteArray at that particular frame, and pass it to the encoder. Each frame may have different Sound objects playing at the same time, so I need to get the bytes of the final sound.
I am aware that SoundMixer.computeSpectrum() can return the current sound in bytes. However, the ByteArray returned has a fixed length of 512, which does not fit in the requirement of the encoder. After a bit of testing, with a sample rate 44khz 8 bit stero, the encoder expects the audio byte data array to have a length of 5880. The data returned by SoundMixer.computeSpectrum() is much much shorter than the encoder required.
My application is running at 60FPS, and recording at 15FPS.
So my question is: Is there any way I can obtain the audio bytes in the current frame, which is mixed by more than one Sound objects, and has the data length enough for the encoder to work? If there is no API to do that, I will have to mix the audio and get the result bytes by myself, how can that be done?

Flex/Actionscript determine if NetStream has audio by analysing audioBytesPerSecond

I need to "look" at a NetStream and determine if I'm receiving audio. From what I investigated, i may use the property audioBytesPerSecond from NetStreamInfo:
"(audioBytesPerSecond) Specifies the rate at which the NetStream audio
buffer is filled in bytes per second. The value is calculated as a
smooth average for the audio data received in the last second."
I also learned that NetStream may have contain some overhead bytes from the network so, which is the minimum reasonable audioBytesPerSecond value to determine if NetStream is playing audio (and not just noise, for example)?
Can this analysis be done this way?
Thanks in advance!
Yes you can do it this way. It's rather subjective, however.
Try to find a threshold that works for you. We used 5 kilobits/sec in the past. If the amount of data falls below this value, they are likely not sending any audio. Note, we were using the stream.info.byteCount property (you might want a slightly lower value if you're using auiodBytesPerSecond).
This is pretty easy to observe if you speak into the microphone and periodically check audioBytesPerSecond or the other counters/statistics that are available.