h264 lossless coding

h264 lossless coding - h.264

Is it possible to do completely lossless encoding in h264? By lossless, I mean that if I feed it a series of frames and encode them, and then if I extract all the frames from the encoded video, I will get the exact same frames as in the input, pixel by pixel, frame by frame. Is that actually possible?
Take this example:
I generate a bunch of frames, then I encode the image sequence to an uncompressed AVI (with something like virtualdub), I then apply lossless h264 (the help files claim that setting --qp 0 makes lossless compression, but I am not sure if that means that there is no loss at any point of the process or that just the quantization is lossless). I can then extract the frames from the resulting h264 video with something like mplayer.
I tried with Handbrake first, but it turns out it doesn't support lossless encoding. I tried x264 but it crashes. It may be because my source AVI file is in RGB colorspace instead of YV12. I don't know how to feed a series of YV12 bitmaps and in what format to x264 anyway, so I cannot even try.
In summary what I want to know if that is there a way to go from
Series of lossless bitmaps (in any colorspace) -> some transformation -> h264 encode -> h264 decode -> some transformation -> the original series of lossless bitmaps
If there a way to achieve this?
EDIT: There is a VERY valid point about lossless H264 not making too much sense. I am well aware that there is no way I could tell (with just my eyes) the difference between and uncompressed clip and another compressed at a high rate in H264, but I don't think it is not without uses. For example, it may be useful for storing video for editing without taking huge amounts of space and not losing quality and spending too much encoding time every time the file is saved.
UPDATE 2: Now x264 doesn't crash. I can use as sources either avisynth or lossless yv12 lagarith (to avoid the colorspace compression warning). Howerver, even with --qp 0 and a rgb or yv12 source I still get some differences, minimal but present. This is troubling, because all the information I have found on lossless predictive coding (--qp 0) claims that the whole encoding should be lossless, but I am unable to verifiy this.

I am going to add a late answer to this one after spending all day trying to figure out how to get YUV 4:4:4 pixels into x264. While x264 does accept raw 4:2:0 pixels in a file, it is really quite difficult getting 4:4:4 pixels passed in. With recent versions of ffmpeg, the following works for completely lossless encoding and extraction to verify the encoding.
First, write your raw yuv 4:4:4 pixels to a file in a planar format. The planes are a set of Y bytes, then the U and V bytes where U and V use 128 as the zero value. Now, invoke ffmpeg and pass in the size of the raw YUV frames as use the "yuv444p" pixel format twice, like so:
ffmpeg -y -s 480x480 -pix_fmt yuv444p -i Tree480.yuv \
-c:v libx264 -pix_fmt yuv444p -profile:v high444 -crf 0 \
-preset:v slow \
Tree480_lossless.m4v
Once the encoding to h264 and wrapping as a Quicktime file is done, one can extract the exact same bytes like so:
ffmpeg -y -i Tree480_lossless.m4v -vcodec rawvideo -pix_fmt yuv444p \
Tree480_m4v_decoded.yuv
Finally, verify the two binary files with diff:
$ diff -s Tree480.yuv Tree480_m4v_decoded.yuv
Files Tree480.yuv and Tree480_m4v_decoded.yuv are identical
Just keep in mind that you need to write the YUV bytes to a file yourself, do not let ffmpeg do any conversion of the YUV values!

If x264 does lossless encoding but doesn't like your input format, then your best bet is to use ffmpeg to deal with the input file. Try starting with something like
ffmpeg -i input.avi -f yuv4mpegpipe -pix_fmt yuv420p -y /dev/stdout \
| x264 $OPTIONS -o output.264 /dev/stdin
and adding options from there. YUV4MPEG is a lossless uncompressed format suitable for piping between different video tools; ffmpeg knows how to write it and x264 knows how to read it.

FFmpeg has a "lossless" mode for x264, see FFmpeg and x264 Encoding Guide
§ Lossless H.264
in essence it's -qp 0

I don't know your requirements for compression and decompression, but a general purpose archiver (like 7-zip with LZMA2) should be able to compress about as small or, in some cases, even significantly smaller than a lossless video codec. And it is much simpler and safer than a whole video processing chain. The downside is the much slower speed, and that you have to extract before seeing it. But for images, I think you should try it.
There is also lossless image formats, like .png.
For encoding lossless RGB with x264, you should use the command line version of x264 (you can't trust GUIs in this edge case, they will probably mess-up) r2020 or newer, with something like that:
x264 --qp 0 --preset fast --input-csp rgb --output-csp rgb --colormatrix GBR --output "the_lossless_output.mkv" "someinput.avs"
Any losses/differences between the input and output should be from some colour space conversion (either before encoding, or at playback), wrong settings or some header/meta-data that was lost. x264 don't supports RGBA, but RGB is ok. YUV 4:4:4 compression is more efficient, but you will lose some data in colour space conversion as your input is RGB. YV12/i420 is much smaller, and by far the most common colour space in video, but you have less chroma resolution.
More information on x264 settings:
http://mewiki.project357.com/wiki/X264_Settings
Also, avoid lagarith. It uses x87 floating point... and there are better alternatives.
http://codecs.multimedia.cx/?p=303
http://mod16.org/hurfdurf/?p=142
EDIT:
I don't know why I was donwvoted. Please leave a comment when you do that.

I agree that sometimes the loss in data is acceptable, but it's not simply a matter of how it looks immediately after compression.
Even a visually imperceptible loss of color data can degrade footage such that color correction, greenscreen keying, tracking, and other post tasks become more difficult or impossible, which add expense to a production.
It really depends when and how you compress in the pipeline, but ultimately it makes sense to archive the original quality, as storage is usually far less expensive than reshooting.

To generate lossless H.264 with HandBrake GUI, set Video Codec: H.264, Constant Quality, RF: 0, H.264 Profile: auto. Though this file is not supported natively by Apple, it can be re-encoded as near-lossless for playback.
HandBrake GUI's Activity Window:
H.264 Profile: auto; Encoding at constant RF 0.000000...profile High 4:4:4 Predictive, level 3.0, 4:2:0 8-bit
H.264 Profile: high; Encoding at constant RF 0.000000...lossless requires high444 profile, disabling...profile High, level 3.0

If you can't get lossless compression using a h.264 encoder and decoder,
perhaps you could look into two alternatives:
(1) Rather than passing all the data in h.264 format, some people are experimenting with transmitting some of the data with a residual "side channel":
(h.264 file) -> h264 decode -> some transformation -> a lossy approximation of the original series of bitmaps
(compressed residual file) --> decoder -> a series of lossless residual bitmaps
For each pixel in each bitmap, approximate_pixel + residual_pixel = a pixel bit-for-bit equal to the original pixel.
(2) Use Dirac video compression format in "lossless" mode.

Use FFmpeg with PowerShell. Type ffmpeg -h encoder=libx264rgb.
You can see Supported pixel formats: bgr0 bgr24 rgb24
When you encoding RGB to YUV or vice versa, you always loose quality.
But if you use -pix_fmt yuv444p -profile:v high444p your losses are least.
But if you use libx264rgb of ffmpeg encoder libx264rgb with format of pixel rgb24 you don't have any loss of quality.
A lot of application (for example Davinci Resolve) cannot read rgb 24 format of pixel.
I reccomend you to use:
ffmpeg -i ["your sequence of rgb image.png"] -c:v libx264rgb -video_size [your size] -framerate [your fps] -r [your fps] -qp 0 -pix_fmt rgb24 -profile:v high444 -preset veryslow -level 6.2 "your_video.mov"
Unfortunately, I don't know how to create sequence. But it is possible in FFmpeg.

Related

How many bytes does ffprobe need?

I would like to use ffprobe to look at the information of media files. However, the files are not on my local disk, and I have to read from a remote storage. I can read the first n bytes, write them to a temporary file and use ffprobe to read the information. I would like to know the least such n.
I tested with a few files, and 512KB worked with all the files that I tested. However, I am not sure if that will work for all media files.

ffprobe (and ffmpeg) aims to parse two things when opening an input:
the input container's header
payload data from each stream, enough to ascertain salient stream parameters like codec attributes and frame rate.
The header size is generally proportional to the number of packets in the file i.e. a 3 hour MP4 file will have a larger header than a 3 min MP4.
(if the header is at the end of the file, then access to the first 512 kB won't help)
From each stream, ffmpeg will decode packets till its stream attributes have been populated. The amount of bytes consumed will depend on stream bitrate, and how many streams are present.
So, the strict response to 'I am not sure if that will work for all media files' is it won't.

comparing h.264 encoding decoding performance

I am beginner of video codec. not an video codec expert
I just want to know base on the same criteria, Comparing H254 encoding/decoding which is more efficiency.
Thanks

Decoding is more efficient. To be useful, decoding must run in real time, where encoding does not (except in videophone / conferencing applications).
How much more efficient? An encoder can generate motion vectors. The more compute power used on generating those motion vectors, the more accurate they are. And, the more accurate they are, the more bandwidth is available for the difference frames, so the quality goes up.
So, the kind of encoding used to generate video for streaming or distribution on DVD or BD discs can run many times slower than real time on server farms. But decoding for that kind of program is useless unless it runs in real time.
Even in the case of real-time encoding it takes more power (actual milliwatts, compute cycles, etc) than decoding.
It's true of H.264, H.265, VP8, VP9, and other video codecs.

Re encode an audio stream recording on the fly?

Is it possible to rip an audio steam with Variable Bit Rate encoding and re encode it on the fly as it is being recorded with Constant Bit Rate encoding?
I am downloading an audio stream in AAC format with VBR encoding using cURL.
The duration of a VBR encoded file will be calculated by its byte length, resulting in a discrepancy for the duration on different players. The result of this duration discrepancy does not allow me to seek and slice precisely. I would need to re encode it somehow with a constant bit rate to get the seeking to work properly.
The audio stream is hours long so re encoding it afterwards takes way too much time and processing power.
Is there anything I can do about this?
Perhaps I can specify some settings in cURL to achieve a constant recording bit rate?

Is it possible to remove start codes using NVENC?

I'm using NVENC SDK to encode OpenGL frames and stream them over RTSP. NVENC gives me encoded data in the form of several NAL units. In order to stream them with Live555 I need to find the start code (0x00 0x00 0x01) and remove it. I want to avoid this operation.
NVENC has a sliceOffset attribute which I can consult, but it indicates slices, not NAL units. It only points the ending of the SPS and PPS headers, where the actual data starts. I understand that a slice is not equal to a NAL (correct me if I'm wrong). I'm already forcing single slices for encoded data.
Is any of the following possible?
Force NVENC to encode individual NAL units
Force NVENC to indicate where the NAL units in each encoded data block are
Make Live555 accept the sequence parameters for streaming

There seems to be a point where every person trying to do H.264 over RTSP/RTP comes down to this question. Well here are my two cents:
1) There is a concept of an access unit. An access unit is a set of NAL units (may be as well be only one) that represent an encoded frame. That is the level of logic you should work at. If you are saying that you want the encoder to give you individual NAL unit's, then what behavior do you expect when the encoding procedure results in multiple NAL units from one raw frame (e.g. SPS + PPS + coded picture). That being said, there are ways to configure the encoder to reduce the number of NAL units in an access unit (like not including the AUD NAL, not repeating SPS/PPS, exclude SEI NAL's) - with that knowledge you can actually know what to expect and kind of force the encoder to give you single NAL per frame (of course this will not work for all frames, but with the knowledge you have about the decoder you can handle that). I'm not an expert on the NVENC API, I've also just started using it, but at least as for Intel Quick Sync, turning off AUD,SEI and disabling repetition of PPS/SPS gave me roughly around 1 NAL per frame for frames 2...N.
2) Won't be able to answer this since as I mentioned I'm not familiar with the API but I highly doubt this.
3) SPS and PPS should be in the first access unit (the first bit-stream you get from the encoder) and you could just find the right NAL's in the bit-stream and extract them, or there may be a special API call to obtain them from the encoder.
All that being said, I don't think it is that hard to actually run through the bit-stream, parse the start codes and extract the NAL unit's and feed them to Live555 one by one. Of course, if the encoder offers to output the bit-stream in the AVCC format (compared to the start codes or Annex B it uses interleaved length value between the NAL units so you can just jump to the next one without looking for the prefix) then you should use it. When it is just RTP it's easy enough to implement the transport yourself, since I've had bad luck with GStreamer that did not have proper support for FU-A packetization, in case of RTSP the overhead of the transport infrastructure is bigger and it is reasonable to use a 3rd party library like Live555.

How does a stored image or video appear in binary on the hard drive?

In attempting to understand the concept of binary, my question is "How does a stored image or video look in binary on the hard drive?"

As for how it is physically stored, it depends on the technology of your storage device. For a hard disk drive you can read about it on Wikipedia.
The next layer is how the controller on the storage device sends the data to the motherboard.
Then how the motherboard sends the data to the operating system.
Then how the operating system stores the data on the disk (what file system it uses; NTFS is common in modern Windows installations.)
Finally, what you'll see when reading the data is groups of 8 bits (bytes) which are basically 8 on/off flags, which together form 256 possible combinations. Which is why most image formats are stored with colors varying from 0-255 for each channel (red, green, blue.) Most raw formats are stored linearly, so you can actually try reading them yourself. A raw image where the first pixel is red (assuming it stores the pixels left-to-right, top-to-bottom) would look like this in bits:
11111111 00000000 00000000
red green blue
For more information, you'll have to be more specific.

Every file on disk is basically a number of bits in a row.
The difference between "binary" and "something else" (often called ASCII, or text, or...) is that non-binary is basically human readable when opened in a text editor. In other words: the bytes in the file map to human readable letter (and other) characters in some way a generic text editor knows how to handle.
So called binary files can only be interpreted back to that data that they actually contain when you know the format which was used to map the content (image, sound, movie, whatever) to a stream of zeros and ones. This mapping is called the file format and is usually part of the file name in the form of an extension. You need a piece of software that knows the mapping and can interpret the row of bits back into the original content.
Mind you: this is usually only a hint. Renaming a JPEG image file to have a .mp3 extension doesn't change it into an audio file; it is still just an image file, containing the image (=dimensions of the image in pixels + the color values for each pixel, basically) encoded into a stream of zeros and ones in the way described in the JPEG file format encoding description.

Check out the link: Binary File Format
The images are sequential flow of colored dots... But it's not hardware dependent i.e. your hard-disk will store any thing in any format which your OS provide it to... However the OS maintain standards of saving file formats other wise a JPG image will not be valid one across different platforms...
Simillarly the videos are flows of images and voice data multiplexed into a sequential flow.

All data on commercial computer systems are stored in binary format (we'll ignore scientific studies into quantum and optical computing).
At the lowest level all files and processing by a computer are performed in binary. This is because our computing systems are powered by the flow of electrons. They either flow or don't. Electric current is on or off. 1 and 0.
The data stored on a hard disk is there due to pulsing of the hard disk write head coil which magnetises spots of hard disk material. These magnetised spots cause a current pulse in the read coil (in actual fact the read and write coils are the same) as the hard disk head passes over them. Hence the data is read as a stream of current pulses, 1s and 0s.
Now processors are built to accept process a finite number of binary "pulses" or data bits simultaneously (it can be anything from 4 bits upwards). Hence a modern 64bit PC can process 64 binary data bits i.e. 64 1s and 0s, at any one time.
Now at a higher level, although all files are stored as binary and can be read in binary format we help the processing of them by telling the processor what format to read them in. This is so that it process the file data as small chunks e.g. 8 bits or 1 byte for ASCII text.
The operating system provides the processor with a template for any given file. This is set up in an extension relation table. And according to what the file extension is the operating system will expect that data to be in a particular format and link it to code that can be used by the processor to interpret it. Hence changing a file name extension will confuse the processor as it won't interpret the data correctly. That's why changing the filename from *.jpg to *.exe won't show the image, as the processor has been told to expect executable code, which the data within the file clearly isn't.
So back to your original question the image within the jpeg file has been encoded as series of 1s and 0s in a specific order.
I'm not sure how exactly they are arranged, but as an example:
A picture was captured and stored as a bitmap at a resoultion of 800 x 600 in 24bit colour. The first pixel is stored as 3 bytes (8 bit binary) representing a red, green and blue value. The value of each byte dictates the intensity of that colour. 0 - 255, with 0 being none at all to 255 being the highest value. Unsigned 255 in binary is 11111111, I won't confuse you with 2's complement for signed values. So the full picture will require a file of minimum 1,440,000 bytes or about 1,406 kilobytes (a kilobyte being 1024 bytes).

The binary as follows: 000010101011010101101010101 would be stored on a hard drive actual microscopic bumps and troughs by changing the polarity of the metalic grains on the disk in specific regions.Binary is actually read from right to left, obviously the opposite way of how most people read text.

If your question is really "how does it look": See Figure 4 on this page; it shows high resolution measurements of a hard drive.
Although googletorp's answer does not look very helpful, it's not totally untrue. To store binary data, the only thing you need is the possibility to have two different states for each storage unit (be it an on/off switch, hole or no hole in a punchcard, or, as in the case of hard drives, the direction of ferromagnetic particles).

The Wikipedia page for the BMP File Format contains an example(Including all hex values) of a 2x2 pixel bitmap image, it should be very good at explaining the basis of the binary representation of an image.
In general if you're really curious how the binary looks for a file you could always use a Hex Viewer and take a look yourself :) I normally use od on Linux to dump the binary information of a file. I'm sure you can google a good Hex Editor for Windows (or maybe someone can suggest one.)

Headers ? Every file created contains header information, that are also stored as binary bits along with the data. The header bits of a files holds the information of header length, file type, file location and length. Now each application is designed to read certain file types. If the application tries to open a file on hard disk which has a header with a different file format, that is not supported by the application, it fails to read the file. Thus a text file cannot be opened using a media player. Because a media player expects a file that contains a header with audio file format binary pattern. Similarly, same in case of picture files.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008