Why h264 over rtp doesn't contain NALU Start Codes - h.264

I read https://stackoverflow.com/a/24890903/12279500. , But when I looking h264 over rtp I recognize Sps,Pps,Idr .... But didn't see h264 start code before each NALU.
Why is that?
How many h264 formats there are , not include Annex B and AVCC

RTP has its own payload format described
in RFC 6184
As for how many formats there are, assume infinite because nothing is stopping anybody from creating more.

The start codes are used to split each NALU in byte stream because the NALU header doesn't have length info. But in rtp protocol, the NALUs are in payload field of the packet so it doesn't need the start codes. You only need to split each rtp packet.

Related

Container format of this RTSP stream

I would like to know the container format of the following stream:
rtsp://8.15.251.47:1935/rtplive/FairfaxVideo3595
According to ffprobe, the container format is RTSP (format_long_name = RTSP input).
I also looked through the debug messages in VLC but I did not find any information on the stream's container format. What I DID find was that the codec was H264 and that VLC was using live555 to decode the stream. The media files live555 can support according to their website (http://www.live555.com/mediaServer/) makes me think that the above stream is an H264 elementary stream and is not in a container format. Am I correct?
Also, if the stream indeed does not have a container format, is it ok to say the container format is RTP (not RTSP as ffprobe says) because that's the protocol used to send the media data?
Thanks!
RTSP is more of a handshake done with the server, while RTP is the actual stream coming in once the handshake is done and you start streaming. RTSP URLs usually start with RTSP://... and the sequence of requests goes roughly something like
RTSP DESCRIBE, RTSP SETUP, RTSP PLAY, TEARDOWN
The response from the server to DESCRIBE will contain the information you need to know about the encoding of the file (H264, JPEG, etc.) while PLAY will cause the server to start sending the RTP stream. I suggest looking up RTSP SDP (session description protocol) for how to extract this information.
In case of streams, you are most likely correct, since the protocol used for streaming is usually RTP, and it tends to go hand in hand with RTSP (however I'm unsure whether or not we can apply the term container in the context of streaming)

Use Media Foundation H.264 encoder with live555

I want to create an H.264 RTSP stream using the live555 streaming library. For encoding the video frames, I want to use the H.264 encoder MFT. Encoding works using the basic processing model (I do not build a graph, but call the MFT manually). Streaming using a custom FramedSource source also seems to work in the sense that the programme is not crashing and the stream is stable in VLC player. However, the image is crippled - no colour, weird line patterns etc.
I assume that I pass the wrong data from the encoder into the streaming library, but I have not been able to find out what the library is actually expecting. I have read that the Microsoft H.264 encoder outputs more than one NAL in a sample. I further found that live555 requires a single NAL to be returned in doGetNextFrame. Therefore, I try to identify the individual NALs (What does this H264 NAL Header Mean? states that the header can be 3 or 4 bytes - I do not know where to get the information what MF uses, but the memory view of the debugger suggests 4 bytes):
for (DWORD i = 0; i < sampleLen; ++i) {
auto v = *reinterpret_cast<unsigned int *>(sampleData + i);
if (v == ::htonl(1)) {
nals.push_back(sampleData + i);
}
}
This piece of code usually identifies more than one item in one output sample from the MFT. However, if I copy the ranges found by this loop into the fTo output buffer, VLC does not show anything and stops after a few seconds. I also read somewhere that live555 does not want the magic number 0x00000001, so i tried to skip it. The effect on the client side is the same.
Is there any documentation on what live555 expects me to copy into the output buffer?
Does the H.264 encoder in Media Foundation at all produce output samples which I can use for streaming?
Do I need to split the output samples? How much do I need to skip once I have found a magic number (How to write a Live555 FramedSource to allow me to stream H.264 live suggests that I might need to skip more than the magic number, because the accepted answer only passes the payload part of the NAL)?
Is there any way to test whether the samples returned by the H.264 MFT in basic processing mode form a valid H.264 stream?
Here's how I did it MFWebCamRtp.
I was able to stream my webcam feed and view it in VLC. There was no need to dig into NALs or such. Each IMFSample from the Media Foundation H264 encoder contains a single NAL that can be passed straight to live555.

Decoding RTP payload as H264 using wireshark

I am streaming a RTSP video from vlc on windows to ipad app. And I capture packets in wireshark. I can see RTP packets in wireshark and also the RTP header fields like payload type, timestamp, sequence number. My question is, is it possible to decode the RTP payload as H264 NAL units. Currently I can only see the bytes in payload.
You need to configure Wireshark to understand that the RTP dynamic payload type maps to H264.
To do this, use the menu; Edit->Preferences->Protocols->H264
Set H264 dynamic payload types to the value shown in the RTP decode for the payload type.

H264 with multiple PPS and SPS

I have a card that produces a H264 stream with a SPS (Sequence Parameter Set) and a PPS (Picture Parameter Set), in that order, directly before each I-Frame. I see that most H264 streams contain a PPS and SPS at the first I-Frame.
Is this recommended? Do decoders/muxers typically support multiple PPS and SRS?
H.264 comes in a variety of stream formats. One variation is called "Annex B".
(AUD)(SPS)(PPS)(I-Slice)(PPS)(P-Slice)(PPS)(P-Slice) ... (AUD)(SPS)(PPS)(I-Slice).
Typically you see SPS/PPS before each I frame and PPS before other slices.
Most decoders/muxers are happy with "Annex B" and the repetition of SPS/PPS.
Most decoders/muxers won't do anything meaningful if you change the format and SPS/PPS midstream.
Most decoders/muxers parse the first SPS/PPS as part of a setup process and ignore subsequent SPSs.
Some decoders/muxers prefer H.264 without the (AUD), start codes and SPS/PPS.
Then you have to feed SPS/PPS out of band as part of setting up the decoders/muxers.
An IDR frame, or an I-slice can not be decoded without a SPS and PPS. In the case of a container like mp4, the SPS and PPS is stored away from the video data in the file header. Upon playback the mp4 is parsed, the SPS/PPS is used to configure the AVC decoder once, then video can be played back starting at any IDR/I-slice.
There is a second scenario, Live video. With live video, there is no file header, because there is no file. So when a TV tunes into a channel, where does it get the SPS/PPS? Because television is broadcast, meaning the television has no way to request the SPS/PPS, it is repeated in the stream.
So when you start encoding video, your encoder does not know what you intend to do with the video. Now if the extra SPS/PPS show up in an mp4, the decoder just ignores them, but if you are streaming to a TV, without them the stream would never play. So most of the default to repeating SPS/PPS just in case.
I know about matroska(mkv) spec so here SPS and PPS are stored only once as codec private data section. So they are not repeating with every i frame or IDR frame.
If your h264 stream's each i frame/IDR frame has SPS/PPS then matroska muxer will store only 1 copy in codec private data.
So while storing usecase based container format suggest to use only one copy of SPS/PPS but broadcasting and streaming based container formats suggest send SPS/PPS before every iFrame/IDR frame or whenever any codec changes change in h264 stream at that time

detect key-frame in TS with H264 codec

Is there an easy not horrifyingly complex way to detect key-frame in an H264 video stream wrapped in a Transport Stream?
Also, if extra previous packets needed for the decoding of the key-frame is there a way to find those as well?
There is no super simple way of finding the I frame. You have to read the transport stream packets of the AVC stream. Then you have to assemble the packetized elementry stream packets (PES), strip the PES header and then identify the NAL type 5.
So you will need an transport stream demuxer, find the beginning of PES packets and do minimal H.264 parsing.
For demuxing you could look at this source code: http://tsdemuxer.googlecode.com/svn/trunk/v1.0/tsdemux.cpp