Use Media Foundation H.264 encoder with live555 - h.264

I want to create an H.264 RTSP stream using the live555 streaming library. For encoding the video frames, I want to use the H.264 encoder MFT. Encoding works using the basic processing model (I do not build a graph, but call the MFT manually). Streaming using a custom FramedSource source also seems to work in the sense that the programme is not crashing and the stream is stable in VLC player. However, the image is crippled - no colour, weird line patterns etc.
I assume that I pass the wrong data from the encoder into the streaming library, but I have not been able to find out what the library is actually expecting. I have read that the Microsoft H.264 encoder outputs more than one NAL in a sample. I further found that live555 requires a single NAL to be returned in doGetNextFrame. Therefore, I try to identify the individual NALs (What does this H264 NAL Header Mean? states that the header can be 3 or 4 bytes - I do not know where to get the information what MF uses, but the memory view of the debugger suggests 4 bytes):
for (DWORD i = 0; i < sampleLen; ++i) {
auto v = *reinterpret_cast<unsigned int *>(sampleData + i);
if (v == ::htonl(1)) {
nals.push_back(sampleData + i);
}
}
This piece of code usually identifies more than one item in one output sample from the MFT. However, if I copy the ranges found by this loop into the fTo output buffer, VLC does not show anything and stops after a few seconds. I also read somewhere that live555 does not want the magic number 0x00000001, so i tried to skip it. The effect on the client side is the same.
Is there any documentation on what live555 expects me to copy into the output buffer?
Does the H.264 encoder in Media Foundation at all produce output samples which I can use for streaming?
Do I need to split the output samples? How much do I need to skip once I have found a magic number (How to write a Live555 FramedSource to allow me to stream H.264 live suggests that I might need to skip more than the magic number, because the accepted answer only passes the payload part of the NAL)?
Is there any way to test whether the samples returned by the H.264 MFT in basic processing mode form a valid H.264 stream?

Here's how I did it MFWebCamRtp.
I was able to stream my webcam feed and view it in VLC. There was no need to dig into NALs or such. Each IMFSample from the Media Foundation H264 encoder contains a single NAL that can be passed straight to live555.

Related

Extract Raw Video Data From WebRTC?

I have a video being streamed over WebRTC. I wish to extract the raw video data -- as delivered by the server (byte for byte)
The reason for this is I want to read the PTS timestamps from the video in order to determine how long as live stream has been playing
I can utilize the MediaRecorder API and record the MediaStream I'm getting back from the RTCPeerConnection, however doing this re-encodes the video. When I do this I get back a WEBM file that, once parsed, has PTS values starting at 0
Is there a way to retrieve the raw data being delivered over the RTCPeerConnection without using a MediaRecorder?

Container format of this RTSP stream

I would like to know the container format of the following stream:
rtsp://8.15.251.47:1935/rtplive/FairfaxVideo3595
According to ffprobe, the container format is RTSP (format_long_name = RTSP input).
I also looked through the debug messages in VLC but I did not find any information on the stream's container format. What I DID find was that the codec was H264 and that VLC was using live555 to decode the stream. The media files live555 can support according to their website (http://www.live555.com/mediaServer/) makes me think that the above stream is an H264 elementary stream and is not in a container format. Am I correct?
Also, if the stream indeed does not have a container format, is it ok to say the container format is RTP (not RTSP as ffprobe says) because that's the protocol used to send the media data?
Thanks!
RTSP is more of a handshake done with the server, while RTP is the actual stream coming in once the handshake is done and you start streaming. RTSP URLs usually start with RTSP://... and the sequence of requests goes roughly something like
RTSP DESCRIBE, RTSP SETUP, RTSP PLAY, TEARDOWN
The response from the server to DESCRIBE will contain the information you need to know about the encoding of the file (H264, JPEG, etc.) while PLAY will cause the server to start sending the RTP stream. I suggest looking up RTSP SDP (session description protocol) for how to extract this information.
In case of streams, you are most likely correct, since the protocol used for streaming is usually RTP, and it tends to go hand in hand with RTSP (however I'm unsure whether or not we can apply the term container in the context of streaming)

H264 with multiple PPS and SPS

I have a card that produces a H264 stream with a SPS (Sequence Parameter Set) and a PPS (Picture Parameter Set), in that order, directly before each I-Frame. I see that most H264 streams contain a PPS and SPS at the first I-Frame.
Is this recommended? Do decoders/muxers typically support multiple PPS and SRS?
H.264 comes in a variety of stream formats. One variation is called "Annex B".
(AUD)(SPS)(PPS)(I-Slice)(PPS)(P-Slice)(PPS)(P-Slice) ... (AUD)(SPS)(PPS)(I-Slice).
Typically you see SPS/PPS before each I frame and PPS before other slices.
Most decoders/muxers are happy with "Annex B" and the repetition of SPS/PPS.
Most decoders/muxers won't do anything meaningful if you change the format and SPS/PPS midstream.
Most decoders/muxers parse the first SPS/PPS as part of a setup process and ignore subsequent SPSs.
Some decoders/muxers prefer H.264 without the (AUD), start codes and SPS/PPS.
Then you have to feed SPS/PPS out of band as part of setting up the decoders/muxers.
An IDR frame, or an I-slice can not be decoded without a SPS and PPS. In the case of a container like mp4, the SPS and PPS is stored away from the video data in the file header. Upon playback the mp4 is parsed, the SPS/PPS is used to configure the AVC decoder once, then video can be played back starting at any IDR/I-slice.
There is a second scenario, Live video. With live video, there is no file header, because there is no file. So when a TV tunes into a channel, where does it get the SPS/PPS? Because television is broadcast, meaning the television has no way to request the SPS/PPS, it is repeated in the stream.
So when you start encoding video, your encoder does not know what you intend to do with the video. Now if the extra SPS/PPS show up in an mp4, the decoder just ignores them, but if you are streaming to a TV, without them the stream would never play. So most of the default to repeating SPS/PPS just in case.
I know about matroska(mkv) spec so here SPS and PPS are stored only once as codec private data section. So they are not repeating with every i frame or IDR frame.
If your h264 stream's each i frame/IDR frame has SPS/PPS then matroska muxer will store only 1 copy in codec private data.
So while storing usecase based container format suggest to use only one copy of SPS/PPS but broadcasting and streaming based container formats suggest send SPS/PPS before every iFrame/IDR frame or whenever any codec changes change in h264 stream at that time

Access StageFright.so directly to decode H.264 stream from JNIlayer in Android

Is there a way to access libstagefright.so directly to decode H.264 stream from JNI layer on Android 2.3 or above?
If your objective is to decode an elementary H.264 stream, then your code will have to ensure that the stream is extracted, the codec-specific-data is provided to the codec which is primarily SPS and PPS data and frame data along with time-stamps is provided to the codec. Across all Android versions, the most common interface would be OMXCodec which is an abstraction over an underlying OMX component.
In Gingerbread (Android 2.3) and ICS (Android 4.0.0), if you would like to create a decoder, the best method would be to create an OMXCodec component and abstract your code through a MediaSource interface i.e. your wrapper code is modeled as MediaSource and OMXCodec reads from this source and performs the decoding.
Link to Android 2.3 Video decoder creation: http://androidxref.com/2.3.6/xref/frameworks/base/media/libstagefright/AwesomePlayer.cpp#1094
Link to Android 4.0.0 Video decoder creation: http://androidxref.com/4.0.4/xref/frameworks/base/media/libstagefright/AwesomePlayer.cpp#1474
The main challenges would be the following:
Model the input as a MediaSource.
Read a wrapper code to read the buffer from the codec and handle the same and release it back to the codec.
For simplification, you could look stagefright command line executable code as in http://androidxref.com/4.0.4/xref/frameworks/base/cmds/stagefright/stagefright.cpp#233
However, if your program is based on JellyBean (Android 4.1.x, 4.2.x) onwards, then these are slightly more simplified. From your JNI code, you could create a MediaCodec component and employ the same for decoding. To integrate the same into your program, you could refer to the SimplePlayer implementation as in http://androidxref.com/4.2.2_r1/xref/frameworks/av/cmds/stagefright/SimplePlayer.cpp#316

detect key-frame in TS with H264 codec

Is there an easy not horrifyingly complex way to detect key-frame in an H264 video stream wrapped in a Transport Stream?
Also, if extra previous packets needed for the decoding of the key-frame is there a way to find those as well?
There is no super simple way of finding the I frame. You have to read the transport stream packets of the AVC stream. Then you have to assemble the packetized elementry stream packets (PES), strip the PES header and then identify the NAL type 5.
So you will need an transport stream demuxer, find the beginning of PES packets and do minimal H.264 parsing.
For demuxing you could look at this source code: http://tsdemuxer.googlecode.com/svn/trunk/v1.0/tsdemux.cpp