comparing h.264 encoding decoding performance - h.264

I am beginner of video codec. not an video codec expert
I just want to know base on the same criteria, Comparing H254 encoding/decoding which is more efficiency.
Thanks

Decoding is more efficient. To be useful, decoding must run in real time, where encoding does not (except in videophone / conferencing applications).
How much more efficient? An encoder can generate motion vectors. The more compute power used on generating those motion vectors, the more accurate they are. And, the more accurate they are, the more bandwidth is available for the difference frames, so the quality goes up.
So, the kind of encoding used to generate video for streaming or distribution on DVD or BD discs can run many times slower than real time on server farms. But decoding for that kind of program is useless unless it runs in real time.
Even in the case of real-time encoding it takes more power (actual milliwatts, compute cycles, etc) than decoding.
It's true of H.264, H.265, VP8, VP9, and other video codecs.

Related

Efficiently streaming audio with large gaps of silence

I would like to create a web page where users can listen to various public safety radio streams in an area. A typical setup involves running a traditional radio scanner to make one audio stream.
My goal is to have a page with many different streams, each one can be selected "on" or "off", and they can play simultaneously.
The naive approach would just be to scale up the traditional audio stream method with one stream per radio channel, however I am concerned about the bandwidth demands and reliability. Users report their streams dropping out and having to reconnect.
My next idea, is to essentially buffer the streams through an FFMPEG instance and record them to disk while also cutting out the silence, then monitor the output of that with some code to "push" the new audio clips out to the listeners.
I don't have any code written yet, I am looking for suggestions on an overall approach to take.
My solution seems overly-complex to me, does anyone know of an audio codec or streaming solution that is well suited to audio with long periods of silence? Or is my idea the best way to do this? Can you think of any improvements?
The naive approach would just be to scale up the traditional audio stream method with one stream per radio channel
This is what I would recommend doing. It's simple, fairly low latency, has no special requirements on the server or the client.
however I am concerned about the bandwidth demands and reliability
Bandwidth for this is minimal. I recommend using Opus for the best quality for the bandwidth. Also, consider using VBR for the encoding. You'll end up with very low bandwidth when there is silence, with more bandwidth used while there's actual content. This is similar to what you were considering doing, but already built into the codec.

Is it possible to remove start codes using NVENC?

I'm using NVENC SDK to encode OpenGL frames and stream them over RTSP. NVENC gives me encoded data in the form of several NAL units. In order to stream them with Live555 I need to find the start code (0x00 0x00 0x01) and remove it. I want to avoid this operation.
NVENC has a sliceOffset attribute which I can consult, but it indicates slices, not NAL units. It only points the ending of the SPS and PPS headers, where the actual data starts. I understand that a slice is not equal to a NAL (correct me if I'm wrong). I'm already forcing single slices for encoded data.
Is any of the following possible?
Force NVENC to encode individual NAL units
Force NVENC to indicate where the NAL units in each encoded data block are
Make Live555 accept the sequence parameters for streaming
There seems to be a point where every person trying to do H.264 over RTSP/RTP comes down to this question. Well here are my two cents:
1) There is a concept of an access unit. An access unit is a set of NAL units (may be as well be only one) that represent an encoded frame. That is the level of logic you should work at. If you are saying that you want the encoder to give you individual NAL unit's, then what behavior do you expect when the encoding procedure results in multiple NAL units from one raw frame (e.g. SPS + PPS + coded picture). That being said, there are ways to configure the encoder to reduce the number of NAL units in an access unit (like not including the AUD NAL, not repeating SPS/PPS, exclude SEI NAL's) - with that knowledge you can actually know what to expect and kind of force the encoder to give you single NAL per frame (of course this will not work for all frames, but with the knowledge you have about the decoder you can handle that). I'm not an expert on the NVENC API, I've also just started using it, but at least as for Intel Quick Sync, turning off AUD,SEI and disabling repetition of PPS/SPS gave me roughly around 1 NAL per frame for frames 2...N.
2) Won't be able to answer this since as I mentioned I'm not familiar with the API but I highly doubt this.
3) SPS and PPS should be in the first access unit (the first bit-stream you get from the encoder) and you could just find the right NAL's in the bit-stream and extract them, or there may be a special API call to obtain them from the encoder.
All that being said, I don't think it is that hard to actually run through the bit-stream, parse the start codes and extract the NAL unit's and feed them to Live555 one by one. Of course, if the encoder offers to output the bit-stream in the AVCC format (compared to the start codes or Annex B it uses interleaved length value between the NAL units so you can just jump to the next one without looking for the prefix) then you should use it. When it is just RTP it's easy enough to implement the transport yourself, since I've had bad luck with GStreamer that did not have proper support for FU-A packetization, in case of RTSP the overhead of the transport infrastructure is bigger and it is reasonable to use a 3rd party library like Live555.

HLS - how to reduce delay?

Anyone know how configure the HLS media server for reduce a little bit the delay of live streaming video?
what types of parameters i need to change?
I had heard that you could do some tuning using parameters like this: HLSMediaFileDuration
Thanks in advance
A Http Live Streaming system typically has an encoder which produces segments of a certain number of seconds and a media server (web server) which serves playlists containing a list of URLs to these segments to player applications.
Media Files = Segments = .ts files = MPEG2-TS files (in HLS speak)
There are some ways to reduce the delay in HLS:
Reduce the encoded segment length from Apple's recommended 10 seconds to 5 seconds or less. Reducing segment length increases network overhead and load on the web server.
Use lower bitrates, larger .ts files take longer to upload and download. If you use multi-bitrate streams, make sure the first bitrate listed in the playlist is a little lower than the bitrate most of your users use. This will reduce the time to start playing back the stream
Get the segments from the encoder to the web server faster. Upload while still encoding if possible. Update the playlist as soon as the segment has finished uploading
Also remember that the higher the delay the better the quality of your stream (low delay = lower quality). With larger segments, there is less overhead so more space for video data. Taking a longer time to encode results in better quality. More buffering results in less chance of video streams stuttering on playback.
HLS is all about trading quality of playback for longer delay, so you will never be able to use HLS for things like video conferencing. Typical delay in HLS is 30-60 sec, minimum in practice is around 15 sec. If you want low delay use RTP for streaming, but good luck getting good quality on low or variable speed networks.
Please specify which media server you use. Generally speaking, yes - changing chunk size will definitely affect delay time. The less is the first chunk, the quicker the video will be shown in the player.
Actually, Apple recommend to divide your file to small chunks this equal length of file, integers.
In practice, there is huge difference between players. Some of them parse manifest changing this values.
Known practice is to pre-cache in memory first chunks in low & medium resolution (Or try to download them in background of app/page - Amazon does this, though their video is MSS)
I was having the same problem and the keys for me were:
Lower the segment length. I set it to 2s because I'm streaming on a local network. In other type of networks, you need to be careful with the overhead that a low segment length adds that can impact your playback quality.
In your manifest, make sure the #EXT-X-TARGETDURATION is accurate. From here:
The EXT-X-TARGETDURATION tag specifies the maximum Media Segment
duration. The EXTINF duration of each Media Segment in the Playlist
file, when rounded to the nearest integer, MUST be less than or equal
to the target duration; longer segments can trigger playback stalls
or other errors. It applies to the entire Playlist file.
For some reason, the #EXT-X-TARGETDURATION in my manifest was set to 5 and I was seeing a 16-20s delay. After changing that value to 2, which is the correct one according to my segments' length, I am now seeing delays of 6-10s.
In summary, you should expect a delay of at least 3X your #EXT-X-TARGETDURATION. So, lowering the segment length and the #EXT-X-TARGETDURATION value should help reducing the delay.

Tools to help parallelize H.264?

I am working with H.264 Decoder using Jm reference software. I am looking for some parallelization tools for parallelizing the reference code of H.264 decoder for multiprocessor mapping.Plz suggest as I am relatively new to this area.
There is no naive way to solve this -- much less a general "automated conversion" approach.
Only a detailed understanding of how H.264 works and careful application of correct parallelization techniques following a correctly parallized algorithm will yield useful results.
H.264, like most Video Formats, relies on temporal data frames and effectively only computes "a running delta", which makes this problem very complex. This is just one of the techniques used to achieve such good compression but the complexity of the format does not stop there: most of the data is related in some fashion! (The more dependenent the data is, the less ideal it is suited for parallel processing.)
I would suggest looking for a (non-reference Open Source) implementation that uses threads, if such an implementation exists. Perhaps look at the codec used by VLC? (In the end I suspect more benefit comes from offloading to special hardware-assist modules such as those bundled with modern ATI or NVidia GPUs.)
If you are really interested in pursuing this, see...
EFFICIENT PARALLELIZATION OF H.264 DECODING WITH MACRO BLOCK LEVEL SCHEDULING
Parallel Scalability of H.264
A Highly Scalable Parallel Implementation of H.264
...and the million other white papers out there (search for "parallel decode h.264").

I read that Huffman coding does not work on GPU but this paper claims otherwise

I have read in several places that building a huffman encoder in GPU is not very efficient because the algorithm is sequential. But this paper offers a possible implementation and claims it to be faster than CPU http://tesla.rcub.bg.ac.rs/~taucet/docs/papers/PAVLE-AnaBalevic09.pdf .
Please advice if the results of the paper are incorrect
It looks like an interesting approach but I'll just offer one caveat: there is very little information about the baseline CPU implementation, but it is most likely single threaded and may not be particularly optimised. It's human nature for people to want to make their optimised implementation look as good as possible, so they tend to use a mediocre baseline benchmark in order to give an impressive speed up ratio. For all we know it may be that a suitably optimised multi-threaded implementation on the CPU could match the GPGPU performance, in which case the GPGPU implementation would not be so impressive. Before investing a lot of effort in a GPGPU implementation I would want to first exhaust all the optimisation possibilities on the CPU (perhaps even using the parallel algorithm as described in the paper, maybe exploit SIMD, threading, etc), since a CPU implementation that meets your performance requirements would be a lot more portable and useful than a solution tied to a particular GPU architecture.
You are right - Huffman algorithm is sequential, though it's not a bottleneck for high speed encoding. Please have a look at the following session on GTC 2012. This is real solution, not just an example.
You can find there some benchmarks for CPU and GPU concerning Huffman encoding and decoding. Huffman encoding on GPU is much faster than on CPU. JPEG decoding on GPU could be much slower in comparison with CPU only in the case when there are no restart markers in JPEG image.
If you need Huffman not for JPEG, then one should use two-pass algorithm. At first pass one can collect statistics and do encoding on the second pass. Both passes could be done in parallel, so it's better to use GPU instead of CPU.
There are a lot of papers saying that GPU is not suitable for Huffman. It just means that there were a lot of attempts to solve the problem. The idea for solution is quite simple: do Huffman encoding for small chunks of data and try to process these chunks in parallel.