Enhance text image in x264 encoding - h.264

I'm making use of x264 for remote desktop streaming. The goal is to achieve both low bitrate and high video quality within the computation budget. The current parameter set I used almost achieve this goal, but it fails in handling images with many texts (e.g. browsing websites scene). The text in image is blurred and affects the user experience.
I think it's the quantization in x264 that causes this. The quantization after DCT transform will eliminate high frequency sinals which mainly correspond to texts in image.
So, my question is how to improve the text quality in x264 encoding?
My idea: when the bitrate stays at a low level for a period of time,
set crf to be 0 (lossless);
encode current frame as an IDR frame and then send it;
recover the crf.
Also, a flag should be used to prevent resending when bitrate keeps low for a long time. I haven't try this method since I don't know how to mark a frame as an IDR frame manully and then encode it.

There might be the answer to your question: x264: the best low-latency video streaming platform in the world. This might also be related: Psy RDO

Related

Is there a "law of diminishing returns" for converting images to Base64 as opposed to simply using the images themselves?

Say I have some icons on my site (32x32, 64x64, 128x128, etc.), then converting them to Base64 makes sense, right?
Now say I have some high resolution images that are 2MB, 3MB, 4MB, and greater that I am using on my site. Does it make sense to convert these larger images to Base64 or is it "smarter" to simply keep using them as .jpg/.png/.gif/etc.?
If there is such a "law", "rule of thumb", etc. what is the line in the sand?
EDIT: While this post was marked as a duplicate, the linked "original" is from over 10 years ago; browser technology, computers, and the web itself, has changed significantly since then. What would be the answer for today's technology?
The answer to the questions is yes, and it depends.
If we rephrase the question to: Does the law of diminishing returns apply to using base64 for embedding images in a page?
a) Yes, the law applies
b) It depends on: Image count and size, and your setup (ie HTTP (HTTP/2?) connection type, etc)
The reason being that more images require more connections imply more handshakes, unless you are using keep alive connections or HTTP/2 streaming. If the images are bigger and require more computing to convert from base64 back to binary (plus decompression), then the bandwidth saves come with CPU expense.
In general, if you have lots of images (icons, for example), you could embed as base64. But in that case you also have the following options:
a) Image Atlas: Converting all small images to a single image (one load) and showing only the portion that you need through the page.
b) Converting to alternative formats, such as fonts or SVG, and again rendering what you need. Example: Open Iconic.

H264 Encoder / Decoder Writing from Scratch?

I think about writing a H264 encoder / decoder from scratch to be able to integrate the tech into a composite product. The first implementation would be written in Java.
I am used to implement scientific papers and such so I should at least bring in the basic math understanding.
What would be the best start and what should I focus on. I know that basically H264 is a mix of existing techniques.
What is are the most important things to implement?
Any idea about how much hours of work the first useful version will take.
The main objective is very fast while maintaining good compression.
How many hours? Maybe 20,000. The decoder specification alone is over 750 page document. And the decoder is the easy part.
After consideration :), I now will still use my own solution based on PNG and JPEG while not using motion vectors. I simply wrote a small solution I can compress parts of the image based on those and use filters to degrade the quality by applying sorts of blur or reduce the number of colors or even resolution. Works well enough for now.
If I need better quality I start looking at VP9 in more detail.
The only draw back is no hardware encoding support which might force me / us to look into H264 again.
Currently I can deliver 60+ frames for every days situation and scale down to 15 frames per second for video content along with bad quality but it is good enough to grab a security cam screen and see if something is wrong.

Facebook like image viewing

I have been wondering how facebook load images so fast.
I am not on any projects related to my question, but I'm just really interested.
With some observation, I noticed that facebook loads a low quality picture as temporary, and shows the high quality one as soon as it is fully loaded.
this makes it seem like it loaded it so fast, but really it was just a low quality one at first.
My question is, how does facebook implement that?
When I put image on my site, it loads it from top to bottom in full quality right on.
Is this done through Javascript/Jquery ajax? or something?
is done through php?
did facebook make to versions on their end? low and high quality? and send the low quality one first?
Thanks :)
Yes! you are right, Facbook loads low quality image first then render it based on network speed. this method called "Progressive JPEGs" which is another type of JPEGs, they are rendered, as the name suggests, progressively.
First you see a low quality version of the whole image. Then, as more of the image information arrives over the network, the quality gradually improves.
From usability perspective, progressive is usually good, because the user gets feedback that something is going on. Also if you’re on a slow connection, progressive JPEG is preferable because you don’t need to wait for the whole image to arrive in order to get an idea if it is what you wanted. If not, you can click away from the page or hit the back button, without waiting for the (potentially large) high quality image.
There is controversial information in blogs and books whether progressive JPEGs are bigger or smaller than the baseline JPEGs in terms of file size.
If you use tool like Photoshop or any designing tool while saving any document in jpg or other format it will ask you 2 option one is for Baseline and another one is Progressive.
But you can achieve same on run time also if you have written any API for this to convert your baseline images to Progressive images while displaying on webpage.
From my understanding, when you click on an image from the Facebook UI, the viewer appears with the low thumbnail version (or a slightly larger version of the thumbnail) loaded. Because of browser caching, that low quality image will display very quickly.
Then in the background, they use javascript to load the higher quality image. Then using some javascript events, they can detect when the higher quality image has loaded. Once loaded, replace the lower quality version with the higher quality version of it.
So from the UI perspective, it's only Javascript. When you upload the photo, they create multiple sizes of the image to allow this effect to happen.

Video, that lets user control the camera angle

The scenario: A guy walks along a route, through the crowd at a pool party.
The camera setup, is a customised rig with an array of GoPro's covering 360 degrees of rotation.
The end result needs to be a video that'll let the user
click & drag the video to change his viewpoint on the video. So for eg, he can turn the angle to behind him, and will see where the guy has walked from. Or he can look sideways as he walks. (Likely some up/down movement too)
pause playback
zoom in/out
So for eg, you spot a hot girl in the crowd. You'd pause, zoom in and then play the video watching her as the guy walks past her.
How could this be achieved with HTML5 (non-Flash) methods?
I don't even know what technologies would be required to achieve something like this, so I'm hoping that someone with a bit of experience in something similar could give me some pointers as to required
coding languages
server technologies
bandwidth considerations
etc
Thanks for your help!
(ps: this is a paid client job. so if you can do exactly this, lets talk about a quote?)
You'd be attempting something quite state-of-the-art.
The way I'd experiment with is to stream a video to the client and display it using WebGL, which the client can then manipulate without latency.
http://riaconnection.wordpress.com/2011/11/03/testing-live-video-streaming-to-webgl-and-html5-video-tag/
One way might be to stream 6 feeds - top, bottom, left, right, back front. These would be pre-processed so when displayed as a cube viewed from the center of the cube, the perspectives are corrected.
If the client can zoom in and out, then that means you'll need higher resolution stream. Six of them would mean very large bandwidth. You'll have to decide on a trade-off between bandwidth, quality and latency. If the client zooms in and changes pan / tilt, you could trade off latency and get better quality and bandwidth, but at the cost of higher server resource requirements.
There are plenty of video processing libraries for PHP, which would probably be my choice of server, but I'm biased.

Can I start or seek to sub-second intervals with NetStream?

I am working on a Flash Video player and am implementing the ability to start a video at x time within the span of an FLV (served from FMS). I am able to start it x seconds into a stream without any issue using
netStream.play(source, startTime);
but as far as I can tell, it only supports seconds. I am looking to be able to give a start time (or even a seek time if that is supported) in milliseconds, or really anything more precise than whole seconds.
Anyone know of any way to achieve this even by monkey patching the fl classes?
Thanks,
Doug
Well the seek will allow you to seek by a number of frames, but the spec says that it only seeks to the closes I-Frame in the FLV. That will become a problem with any method you use, because the I-Frames are the only ones that actually contain the whole picture frame (here's the gist of that).