Downsampling possible within RTSP/RTP? - h.264

I have a media server serving several cameras.
I'd like for the server to downsample the data from, say, 20 fps to 1 fps.
Obviously I could do this by decoding and recoding the video frames - however, the server is a little resource constrained. I notice that if I simply drop RTP UDP packets, the output is not so good - I see both tearing and junk in the images (at least with a opencv/ffmpeg client).
Is it possible to downsample within an RTP stream by dropping more carefully chosen frames/packets, to avoid junk and tearing in the output? (Currently I'm able to extract RTP|H264 raw data chunks on the server, but am not running them through a full codec).

An H.264 stream consists of different frame types: I (or IDR), P and B.
I (or IDR) frames are a full pictures and can be decoded without any other frames.
So you could filter out P and B frames and only pass on I frames.
Your resulting frame rates depends on the I (or IDR) frame frequency of the original stream. I am guessing you get somewhere between 0.1 to 2 fps.

Related

Is it possible to remove start codes using NVENC?

I'm using NVENC SDK to encode OpenGL frames and stream them over RTSP. NVENC gives me encoded data in the form of several NAL units. In order to stream them with Live555 I need to find the start code (0x00 0x00 0x01) and remove it. I want to avoid this operation.
NVENC has a sliceOffset attribute which I can consult, but it indicates slices, not NAL units. It only points the ending of the SPS and PPS headers, where the actual data starts. I understand that a slice is not equal to a NAL (correct me if I'm wrong). I'm already forcing single slices for encoded data.
Is any of the following possible?
Force NVENC to encode individual NAL units
Force NVENC to indicate where the NAL units in each encoded data block are
Make Live555 accept the sequence parameters for streaming
There seems to be a point where every person trying to do H.264 over RTSP/RTP comes down to this question. Well here are my two cents:
1) There is a concept of an access unit. An access unit is a set of NAL units (may be as well be only one) that represent an encoded frame. That is the level of logic you should work at. If you are saying that you want the encoder to give you individual NAL unit's, then what behavior do you expect when the encoding procedure results in multiple NAL units from one raw frame (e.g. SPS + PPS + coded picture). That being said, there are ways to configure the encoder to reduce the number of NAL units in an access unit (like not including the AUD NAL, not repeating SPS/PPS, exclude SEI NAL's) - with that knowledge you can actually know what to expect and kind of force the encoder to give you single NAL per frame (of course this will not work for all frames, but with the knowledge you have about the decoder you can handle that). I'm not an expert on the NVENC API, I've also just started using it, but at least as for Intel Quick Sync, turning off AUD,SEI and disabling repetition of PPS/SPS gave me roughly around 1 NAL per frame for frames 2...N.
2) Won't be able to answer this since as I mentioned I'm not familiar with the API but I highly doubt this.
3) SPS and PPS should be in the first access unit (the first bit-stream you get from the encoder) and you could just find the right NAL's in the bit-stream and extract them, or there may be a special API call to obtain them from the encoder.
All that being said, I don't think it is that hard to actually run through the bit-stream, parse the start codes and extract the NAL unit's and feed them to Live555 one by one. Of course, if the encoder offers to output the bit-stream in the AVCC format (compared to the start codes or Annex B it uses interleaved length value between the NAL units so you can just jump to the next one without looking for the prefix) then you should use it. When it is just RTP it's easy enough to implement the transport yourself, since I've had bad luck with GStreamer that did not have proper support for FU-A packetization, in case of RTSP the overhead of the transport infrastructure is bigger and it is reasonable to use a 3rd party library like Live555.

HLS - how to reduce delay?

Anyone know how configure the HLS media server for reduce a little bit the delay of live streaming video?
what types of parameters i need to change?
I had heard that you could do some tuning using parameters like this: HLSMediaFileDuration
Thanks in advance
A Http Live Streaming system typically has an encoder which produces segments of a certain number of seconds and a media server (web server) which serves playlists containing a list of URLs to these segments to player applications.
Media Files = Segments = .ts files = MPEG2-TS files (in HLS speak)
There are some ways to reduce the delay in HLS:
Reduce the encoded segment length from Apple's recommended 10 seconds to 5 seconds or less. Reducing segment length increases network overhead and load on the web server.
Use lower bitrates, larger .ts files take longer to upload and download. If you use multi-bitrate streams, make sure the first bitrate listed in the playlist is a little lower than the bitrate most of your users use. This will reduce the time to start playing back the stream
Get the segments from the encoder to the web server faster. Upload while still encoding if possible. Update the playlist as soon as the segment has finished uploading
Also remember that the higher the delay the better the quality of your stream (low delay = lower quality). With larger segments, there is less overhead so more space for video data. Taking a longer time to encode results in better quality. More buffering results in less chance of video streams stuttering on playback.
HLS is all about trading quality of playback for longer delay, so you will never be able to use HLS for things like video conferencing. Typical delay in HLS is 30-60 sec, minimum in practice is around 15 sec. If you want low delay use RTP for streaming, but good luck getting good quality on low or variable speed networks.
Please specify which media server you use. Generally speaking, yes - changing chunk size will definitely affect delay time. The less is the first chunk, the quicker the video will be shown in the player.
Actually, Apple recommend to divide your file to small chunks this equal length of file, integers.
In practice, there is huge difference between players. Some of them parse manifest changing this values.
Known practice is to pre-cache in memory first chunks in low & medium resolution (Or try to download them in background of app/page - Amazon does this, though their video is MSS)
I was having the same problem and the keys for me were:
Lower the segment length. I set it to 2s because I'm streaming on a local network. In other type of networks, you need to be careful with the overhead that a low segment length adds that can impact your playback quality.
In your manifest, make sure the #EXT-X-TARGETDURATION is accurate. From here:
The EXT-X-TARGETDURATION tag specifies the maximum Media Segment
duration. The EXTINF duration of each Media Segment in the Playlist
file, when rounded to the nearest integer, MUST be less than or equal
to the target duration; longer segments can trigger playback stalls
or other errors. It applies to the entire Playlist file.
For some reason, the #EXT-X-TARGETDURATION in my manifest was set to 5 and I was seeing a 16-20s delay. After changing that value to 2, which is the correct one according to my segments' length, I am now seeing delays of 6-10s.
In summary, you should expect a delay of at least 3X your #EXT-X-TARGETDURATION. So, lowering the segment length and the #EXT-X-TARGETDURATION value should help reducing the delay.

Reduce FPS on OSMF stream - Issue with MPEG-2 header

I've been searching all over and can't find a solution. I have a 25 FPS video that I'm playing on OSMF, but OSMF insists on playing with 29-31 FPS. This causes the video to play ~15% faster than real time. The result is extremely noticeable if you open the same video in VLC and play it side by side.
The problem comes in when I try to do a live stream. It will eat through the buffer and catch up to real time then the stream crashes because there's no new video waiting.
I've tried tracing the code to find out where the frames are actually output to the screen, but I hit a dead end at the SWC file. I also have tried searching online but I can't find anything about limiting the FPS - everyone is just interested in increasing it.
I'd rather play at 15 FPS and drop 10 frames per second than catch up to real time and crash tragically.
Edit - after an entire weekend spent staring at this issue I've made some incredible headway. First and foremost, the only way to limit the FPS in OSMF is by sending a custom FLV header with the timestamp set appropriately (1000 / FPS difference between each frame)
Realizing this I could solve this issue I'm having temporarily by manually setting the timestamps based on an internal counter. Each time a frame is processed set timestamp = last_timestamp + 40;. The problem is that I don't know if video will always be 25 FPS. Some day I may have 30 FPS or even 60 FPS video streams. To make this more robust I decided to decode the MPEG-2 header (read the PTS value) and convert it to an FLV header.
Now here's the issue… This video file (theoretically 25 FPS) plays perfectly in QuickTime. As a result I know the headers are fine because an expensive piece of software with billions of dollars behind it properly calculated the frame rate. But when I read the PTS from the header (as per this SO post) and divide by 90 (convert 90Khz clock to millisecond timestamp) each timestamp is 33 or 34 milliseconds apart - the 29~31 FPS I was getting.
So why is the PTS giving me timestamps that are 33-34 milliseconds apart when I know the video is 25 FPS (40 milliseconds apart)? More importantly, how is QuickTime reading the MPEG-2 header so that everything plays correctly?
First, to answer my original question:
Reducing FPS in OSMF
You must create your own implementation of the file handler which parses the video header and modifies the timestamps. There is no way to tell OSMF to play in "slow motion" or "fast forward". For an example of creating your own file handler, look at the FLVParser class. Notice that there are separate classes for parsing video and audio tags. The headers must be updated in EACH of these to ensure that video and audio play back in sync.
When a file is passed to the video parser, each timestamp is considered relative to the first. So if the first timestamp is 1234 then this will be set as "time zero" and all future timestamps will be relative to this. This is important. If you skip a video tag and the first timestamp you send is from later in the video, it will use the wrong value for "time zero" and things will not be sync'd properly.
This bring us to…
My issue
First and foremost, the durations in the M3U8 did not match up with the sum of the differences of timestamps. Starting at the first timestamp and labeling it "time zero", then looking at the last time stamp and subtracting time zero from it, the resulting time span was not equal to the expected duration of the TS file.
It turns out when VLC or QuickTime encounter this situation (the sum of PTS values does not equal the duration of the video) they generate new headers on the fly. Take the duration, divide by the number of frames, there's your new offset between PTS values.
That was my first problem. Once that was solved I was no longer gaining a second every 10 seconds, but instead gaining a second every 2 minutes. Turns out I was also loading the next TS file a bit too early (off-by-one error) which was causing me to drop a packet from each TS. This led to losing one frame every 10 seconds.
Additional Info
I ran into another issue with FPS shortly after this which I have found a solution for. Anyone experiencing OSMF playing videos too quickly, I urge you to grep your code for bufferTimeMax.
When bufferTimeMax > 0 and bufferLength >= bufferTimeMax, audio plays faster until bufferLength reaches bufferTime. If a live stream is video-only, video plays faster until bufferLength reaches bufferTime.
Depending on how much playback is lagging (the difference between bufferLength and bufferTime), Flash Player controls the rate of catch-up between 1.5% and 6.25%. If the stream contains audio, faster playback is achieved by frequency domain downsampling which minimizes audible distortion.
6.25% = 0.975 seconds per 15 (gaining a second every 10~20)
You would need to set the FPS manually.
In Flash Professional, you can change the framerate in Document Properties.
In Flash Builder, you can set the framerate using metadata, as described here and here.
Thus...
[SWF(frameRate='15')]

How to play FLV video at "incoming frames speed" (not video-internal) coming from NetStream in ActionScript

How to play NetStream frames immediatly as they arrive without any additional AS framerate logic?
I have recorded some audio & video data packets from RTMP protocol received by Red5, and now I'm trying to send it back to the flash client in a loop by pushing packets to NetStream with incrementing timestamp. The looped sequence has length of about ~20 sec nad is build from about ~200 RTMP packets (VideoData/AudioData)
Environment: both Flash client and server on localhost, no network bottleneck, video is H.264 encoded earlier by same Flash client.
It generaly works, but video is not very fluent - there ale lot of freezes, slowdowns and long pauses. The slower packets transmitting causing the more pauses and freezes., even extreme long pauses like transmiting whole sequence 2x-3x times (~60 sec) without effect - this comes up when forwarding slower than ~2 RTPM packets per second.
The problem looks like some AS-logic is trying to force framerate of a video, not just output received frames, so one of my questions is does AS looks for in-video-frame fps info in live streaming? why it can play faster, but can't play slower? How can I play video "by frames" not synchronizing video fps with RTPM packets timestamps?
On the other side, if I push packets faster than recorder, the video is just faster but almost fluent - I just can't get slower or stable stream (still very irregular speed).
I have analysed some NetStream values:
.bufferLength = ~0 or 0.001, incrasing when I forward packets
extremaly fast (like targeting ~90fps)
.currentFPS = shows real FPS
count seen in Video object, not incoming frames/s
.info.currentBytesPerSecond = ~8 kB/s to ~50kB/s depending on
forwarding speed
.info.droppedFrames = frequently incrases, even if I
stream packets like 2/sec! also jumps after long self-initiated-pause (but buffer
is whole time 0!)
.info.isLive = true
.info.dataBufferLength = same as .bufferLength
It looks like AS is dropping frames, because of too rare RTMP packets receive - like expecting that they will arrive with internal-frame-encoded-fps-speed.
My currently best NetStreamconfiguration:
chatStream.videoReliable = false;
chatStream.audioReliable = false;
chatStream.backBufferTime = 0;
chatStream.bufferTime =0;
Note that if I set bufferTime to 1, video is paused until gathering "1 second of video" but this is not true - buffering is very slow, like assuming that video has FPS of 100 or 200 - even if I'm forwarding packets fast (like targeting ~15fps without buffer), the buffer is filing about 10-20 seconds.
Loop, of course, starts with keyframed video data and keyframe interval of sequence is about 15 frames.
Have you tried netStream.step(1)
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/net/NetStream.html#step()
Also I would remove 'live' from the play() call.
Finally, maybe flash tries to sync to the audio like it does in regular timeline. Have you tried a video without audio channel?

Obtain the result ByteArray of the current playing sounds

I am developing an AIR application for desktop that simulate a drum set. Pressing the keyboard will result in a corresponding drum sound played in the application. I have placed music notes in the application so the user will try to play a particular song.
Now I want to record the whole performance and export it to a video file, say flv. I have already succeed in recording the video using this encoder:
http://www.zeropointnine.com/blog/updated-flv-encoder-alchem/
However, this encoder does not have the ability to record sound automatically. I need to find a way to get the sound in ByteArray at that particular frame, and pass it to the encoder. Each frame may have different Sound objects playing at the same time, so I need to get the bytes of the final sound.
I am aware that SoundMixer.computeSpectrum() can return the current sound in bytes. However, the ByteArray returned has a fixed length of 512, which does not fit in the requirement of the encoder. After a bit of testing, with a sample rate 44khz 8 bit stero, the encoder expects the audio byte data array to have a length of 5880. The data returned by SoundMixer.computeSpectrum() is much much shorter than the encoder required.
My application is running at 60FPS, and recording at 15FPS.
So my question is: Is there any way I can obtain the audio bytes in the current frame, which is mixed by more than one Sound objects, and has the data length enough for the encoder to work? If there is no API to do that, I will have to mix the audio and get the result bytes by myself, how can that be done?