Why is it possible to use BT.709 in H.264 to represent more colors than BT.601? - h.264

Why is it possible to use BT.709 in H.264 to represent more colors than BT.601? I think that for YUV, they are all only Y, U, and V data. When converting to RGB, only different matrices are used. Using different matrices may cause U to become more V and less. BT.709 It is not possible to indicate more colors than BT.601. Or there is a color that BT.709 cannot represent but BT.601 can represent.Can anyone tell me where the mistake is.

601, and 709 can display the same number of colors. 601s colors tend to map better to what a CRT can actually display, where as 709 maps to LCD. However 601 more commonly uses partial range, meaning each byte is encoded using the range 16-235, where as it is common to use full range with 709 (0-255). This is because partial range leaves padding at the extremes for analog distribution. Since the broadcast TV digital switch over happened in the US around the same time as everybody change from CRT to LCD the range/space tend to be linked.

DVD used SMPTE 170M a.k.a. SMPTE C (SMPTE C uses D65 white point) primaries, while BT.601 matrix was derived from ITU-R Rec. BT.470 System M a.k.a. NTSC 1953 primaries and white point (that is Illuminant C and not D65 as the white point). (PAL DVD used BT.601 matrix too and so is JPEG, that defaults to BT.709 primaires.) What that means there is a mismatch between primaries and white point and matrix, while there should be none for optimal use of codepoints. So when new matrix was derived from BT.709 and even newer for BT.2020 primaires, it was more suitable. The conversion from primaries and white points to get two values that define YCbCr matrix is specified in ITU-T Rec. H.273.
What is also a problem is that BT.709 primaries are what is de facto standard for SDR displays, even with WCG one must to limit to BT.709 (that is done in Natural mode on galaxy devices and iPhones and LG TVs). But after decoding BT.601 content you will have SMPTE C or PAL primaries, that MUST be color managed to BT.709 primaries, that also introduces some loss.

Related

What does Nvidia mean when they say samples per pixel in regards to DLSS?

"NVIDIA researchers have successfully trained a neural network to find
these jagged edges and perform high-quality anti-aliasing by
determining the best color for each pixel, and then apply proper
colors to create smoother edges and improve image quality. This
technique is known as Deep Learning Super Sample (DLSS). DLSS is like
an “Ultra AA” mode-- it provides the highest quality anti-aliasing
with fewer artifacts than other types of anti-aliasing.
DLSS requires a training set of full resolution frames of the aliased
images that use one sample per pixel to act as a baseline for
training. Another full resolution set of frames with at least 64
samples per pixel acts as the reference that DLSS aims to achieve."
https://developer.nvidia.com/rtx/ngx
At first I thought of sample as it is used in graphics, an intersection of channel and a pixel. But that really doesn't make any sense in this context, going from 1 channel to 64 channels ?
So I am thinking it is sample as in the statistics term but I don't understand how a static image could come up with 64 variations to compare to? Even going from FHD to 4K UHD is only 4 times the amount of pixels. Trying to parse that second paragraph I really can't make any sense of it.
16 bits × RGBA equals 64 samples per pixel maybe? They say at least, so higher accuracy could take as much as 32 bits × RGBA or 128 samples per pixel for doubles.

What are the benefits of Gray code in evolutionary computation?

Books and tutorials on genetic algorithms explain that encoding an integer in a binary genome using Gray code is often better than using standard base 2. The reason given is that a change of +1 or -1 in the encoded integer, requires only one bit flip for any number. In other words, neighboring integers are also neighboring in Gray code, and the optimization problem in Gray encoding has at most as many local optima as the original numeric problem.
Are there other benefits to using Gray code, compared to standard base 2?
Gray Encoding is used to avoid the occurrences of Hamming Walls. As explained in this paper, Section 3.5.
Basically A Hamming wall is a point at which it becomes rare or highly unlikely that the GA will mutate in exactly the right way to produce the next step in fitness.
Due to the properties of Gray Coding, this is much less likely to happen.

Progressive jpeg vs baseline jpegs

I have a web gallery where I display images which vary in file sizes and resolutions uploaded by users. Currently all the images are baseline. So I would like to know whether it would really have any significant impact if I converted them to progressive images. What are the advantages and tradeoffs on using progressive images.
The JPEG standard defines a variety of compression modes. Only three of these are in widespread use:
Baseline Sequential
Extended Sequential
Progressive
The only difference in the first to is in the number of tables allowed. Otherwise, they are encoded and decodes in exactly the same way.
JPEG divides images into Frames that are then divided into Scans. The modes above only permit one frame. The frame is the image. The scans are passes through the image data. A scan may be contain the data for one color component or it may be interleaved and contain data for multiple color components.
A grayscale sequential JPEG stream will have one scan.
A color sequential JPEG stream may have one or three scans.
JPEG takes 8x8 blocks of pixel data and applies the discrete cosine transform to that data. The 64 pixel data become 64 DCT coefficients. The first DCT coefficient is called the "DC" coefficient and the other 63 are called "AC" coefficients.
This is confusing terminology that drawing on the analogy with DC and AC current. The DC coefficient is analogous to the average pixel value of the block.
In sequential JPEG, the 64 coefficients in a block are encoded together (with the DC and AC coefficients encoded differently). In Progressive JPEG, the DC and the AC coefficients scans encode bitfields (of configurable size) within the coefficient. In theory, you could have a separate scan for each bit of each component.
Progressive JPEG is much more complicated to implement and use. If you are creating an encoder for sequential JPEG, you just need to give the caller the option to use interleaved or non-interleaved scans. For progressive JPEG your encoder needs a mechanism to the caller to determine how many scans and what bits should be encoded in each scan.
Progressive encoding can be slower than sequential because you have to make multiple passes over the data.
The speed issue in progressive decoding depends upon how it is done. If you decode the entire image at once, progressive is possibly marginally slower than sequential. If your decoder shows the image fading in as it processes the stream it will be much slower than sequential. Each time you update the display, you have to do the inverse DCT, upsampling, and color transformation.
On the other hand, it is possible to get much better compression using progressive JPEG with well-tuned scans.
There is no difference in quality between progressive and sequential
This book describes the processes:
https://www.amazon.com/Compressed-Image-File-Formats-JPEG/dp/0201604434/ref=asap_bc?ie=UTF8
The only difference is that progressive images are encoded in a way that allows browsers to display a rough preview of the image while it is still being downloaded, which becomes progressively better in quality until finally the download is complete. A baseline image will load from top to bottom, a progressive image will load from low-resolution to high-resolution.
For browsers which do not support progressive images, you won't see anything until the entire image has been loaded. (Nowadays all halfway modern browsers support progressive JPEGs.)
You can see animations of the difference in action, e.g. here: https://www.youtube.com/watch?v=TOc15-2apY0

what is COLOR_FormatYUV420Flexible?

I wish to encode video/avc on my android encoder. The encoder (Samsung S5) publishes COLOR_FormatYUV420Flexible as one of its supported formats. yay!
but I dont quite understand what it is and how I can use it. the docs say:
Flexible 12 bits per pixel, subsampled YUV color format with 8-bit chroma and luma components.
Chroma planes are subsampled by 2 both horizontally and vertically. Use this format with Image. This format corresponds to YUV_420_888, and can represent the COLOR_FormatYUV411Planar, COLOR_FormatYUV411PackedPlanar, COLOR_FormatYUV420Planar, COLOR_FormatYUV420PackedPlanar, COLOR_FormatYUV420SemiPlanar and COLOR_FormatYUV420PackedSemiPlanar formats
This seems to suggest that I can use this const with just about any kind of YUV data: planer, semi-planer, packed etc. this seems unlikely: how would the encoder know how to interpret the data unless I specify exactly where the U/V values are?
is there any meta-data that I need to provide in addition to this const? does it just work?
Almost, but not quite.
This constant can be used with almost any form of YUV (both planar, semiplanar, packed and all that). But, the catch is, it's not you who can choose the layout and the encoder has to support it - it's the other way around. The encoder will choose the surface layout and will describe it via the flexible description, and you need to support it, whichever one it happens to be.
In practice, when using this, you don't call getInputBuffers() or getInputBuffer(int index), you call getInputImage(int index), which returns an Image, which contains pointers to the start of the three planes, and their row and pixel strides.
Note - when calling queueInputBuffer afterwards, you have to supply a size parameter, which can be tricky to figure out - see https://stackoverflow.com/a/35403738/3115956 for more details on that.

Cesium Resampling

I know that Cesium offers several different interpolation methods, including linear (or bilinear in 2D), Hermite, and Lagrange. One can use these methods to resample sets of points and/or create curves that approximate sampled points, etc.
However, the question I have is what method does Cesium use internally when it is rendering a 3D scene and the user is zooming/panning all over the place? This is not a case where the programmer has access to the raster, etc, so one can't just get in the middle of it all and call the interpolation functions directly. Cesium is doing its own thing as quickly as it can in response to user control.
My hunch is that the default is bilinear, but I don't know that nor can I find any documentation that explicitly says what is used. Further, is there a way I can force Cesium to use a specific resampling method during these activities, such as Lagrange resampling? That, in fact, is what I need to do: force Cesium to employ Lagrange resampling during scene rendering. Any suggestions would be appreciated.
EDIT: Here's a more detailed description of the problem…
Suppose I use Cesium to set up a 3-D model of the Earth including a greyscale image chip at its proper location on the model Earth's surface, and then I display the results in a Cesium window. If the view point is far enough from the Earth's surface, then the number of pixels displayed in the image chip part of the window will be fewer than the actual number of pixels that are available in the image chip source. Some downsampling will occur. Likewise, if the user zooms in repeatedly, there will come a point at which there are more pixels displayed across the image chip than the actual number of pixels in the image chip source. Some upsampling will occur. In general, every time Cesium draws a frame that includes a pixel data source there is resampling happening. It could be nearest neighbor (doubt it), linear (probably), cubic, Lagrange, Hermite, or any one of a number of different resampling techniques. At my company, we are using Cesium as part of a large government program which requires the use of Lagrange resampling to ensure image quality. (The NGA has deemed that best for its programs and analyst tools, and they have made it a compliance requirement. So we have no choice.)
So here's the problem: while the user is interacting with the model, for instance zooming in, the drawing process is not in the programmer's control. The resampling is either happening in the Cesium layer itself (hopefully) or in even still lower layers (for instance, the WebGL functions that Cesium may be relying on). So I have no clue which technique is used for this resampling. Worse, if that technique is not Lagrange, then I don't have any clue how to change it.
So the question(s) would be this: is Cesium doing the resampling explicitly? If so, then what technique is it using? If not, then what drawing packages and functions are Cesium relying on to render an image file onto the map? (I can try to dig down and determine what techniques those layers may be using, and/or have available.)
UPDATE: Wow, my original answer was a total misunderstanding of your question, so I've rewritten from scratch.
With the new edits, it's clear your question is about how images are resampled for the screen while rendering. These
images are texturemaps, in WebGL, and the process of getting them to the screen quickly is implemented in hardware,
on the graphics card itself. Software on the CPU is not performant enough to map individual pixels to the screen
one at a time, which is why we have hardware-accelerated 3D cards.
Now for the bad news: This hardware supports nearest neighbor, linear, and mapmapping. That's it. 3D graphics
cards do not use any fancier interpolation, as it needs to be done in a fraction of a second to keep frame rate as high as possible.
Mapmapping is described well by #gman in his article WebGL 3D Textures. It's
a long article but search for the word "mipmap" and skip ahead to his description of that. Basically a single image is reduced
into smaller images prior to rendering, so an appropriately-sized starting point can be chosen at render time. But there will
always be a final mapping to the screen, and as you can see, the choices are NEAREST or LINEAR.
Quoting #gman's article here:
You can choose what WebGL does by setting the texture filtering for each texture. There are 6 modes
NEAREST = choose 1 pixel from the biggest mip
LINEAR = choose 4 pixels from the biggest mip and blend them
NEAREST_MIPMAP_NEAREST = choose the best mip, then pick one pixel from that mip
LINEAR_MIPMAP_NEAREST = choose the best mip, then blend 4 pixels from that mip
NEAREST_MIPMAP_LINEAR = choose the best 2 mips, choose 1 pixel from each, blend them
LINEAR_MIPMAP_LINEAR = choose the best 2 mips. choose 4 pixels from each, blend them
I guess the best news I can give you is that Cesium uses the best of those, LINEAR_MIPMAP_LINEAR to
do its own rendering. If you have a strict requirement for more time-consuming imagery interpolation, that means you
have a requirement to not use a realtime 3D hardware-accelerated graphics card, as there is no way to do Lagrange image interpolation during a realtime render.