Understanding/controlling MLT melt slideshow? - video-editing

Consider the following bash script (on Ubuntu 18.04, melt 6.6.0), which uses melt to make a slideshow and play it locally in a window (SDL consumer), mostly copied from https://mltframework.org/blog/making_nice_slideshows/ ( edit: I'm aware that's its possible to specify files individually as in https://superuser.com/questions/833232/create-video-with-5-images-with-fadein-out-effect-in-ffmpeg/834035#834035 - but that approach seems to scale images during transition, and takes quite a while to "render" before playing in SDL window, while this one has nearly instant playback):
echo "
description=DV PAL
frame_rate_num=25
frame_rate_den=1
width=720
height=576
progressive=0
sample_aspect_num=59
sample_aspect_den=54
display_aspect_num=4
display_aspect_den=3
colorspace=601
" > my-melt.profile
mkdir tmppics
convert -background lightblue -fill blue -size 3840x2160 -pointsize 200 -gravity center label:"Test A" tmppics/pic_01.jpg
convert -background lightblue -fill blue -size 3840x2160 -pointsize 200 -gravity center label:"Test B" tmppics/pic_02.jpg
melt -verbose -profile ./my-melt.profile \
./tmppics/.all.jpg ttl=6 \
-attach crop center=1 \
-filter luma cycle=6 duration=4 \
-consumer sdl
When I run the above command, the video shows the two images loop, but the frame counter keeps on going, increasing indefinitely. How do I make it stop after the exact amount of frames that the loop is long?
As far as I can see, the size of the output video is controlled by a profile; that is, even if I don't specify -profile, a default one is assumed; is that correct?
The original images look like this:
... and the video looks like this:
... which means the aspect ratio is wrong; additionally I can see jagged edges, meaning the scaled image in the video is not antialiased.
How do I make the image fit in video size with correct aspect ratio, with antialiasing/smoothing? (I guess it has to do with -attach crop center=1, but I couldn't find documentation on that).
When viewing stuff in SDL and stepping through frames, are frames numbered 0-based, - or are they 1-based, and at frame 0 simply the same frame as 1 is shown?
If I use ttl=6 and -filter luma cycle=6 duration=4, I get this:
... that is, visible transition starts at frame 7 (frame 6 is full image A), lasts for frames 7 and 8, and ends at frame 9 (which is full image B); then again at frames 13 and 14 (frame 15 is full image A)
However, if I use ttl=6 and -filter luma cycle=6 duration=2, then I get this:
... that is, there is no transition, image instantly changes at frame 7, then again at frame 13, etc.
So, I'd call the first case a transition duration of 2 frames, and the second case a duration of 0 frames - yet the options are duration=4 and duration=2, respectively. Can anyone explain why? Where are those 2 frames of difference gone?
Can I - and if so, how - do the same kind of slideshow, except with fade to black? I'd like to define a "time to live" (ttl) of 6 frames per image, and a transition of 4 frames, such that:
first, 4 frames are shown of image A;
then one frame image A faded, followed by one frame black (amounting to 6 frames TTL for image A, the last 2 transition);
then two frames image B faded (amounting to 4 frames transition with previous 2), followed by two more frames image B full (so 4 frames here of image B);
then one frame image B faded, followed by one frame black (amounting to 6 frames TTL for image B);
... etc.
Is it possible to persuade melt to use globbing to select images for slideshow, instead of using .all.jpg? As far as I can see on MLT (Media Lovin' Toolkit) Photo Slide Video no - but maybe there is another approach...

Ok, so, I spent some time looking into the commands for melt and turns out there is actually a pretty effective way of altering a bunch of images (if the number of arguments is too long or there are too many characters for your terminal to handle).
What you want to do is to use -serialise <name of file>.melt which will store your commands (you can also create this file manually). Then to execute that file, use melt <name of file>.melt along with any other options you have for your video file.
Example Format:
melt <images and what to do to them> -serialise <name of file>.melt
Example
Create the melt file (with Melt CLI)
melt image1.png out=50 image2.png out=75 -mix 25 -mixer luma image3.png out=75 -mix 25 -mixer luma image3.png out=75 -mix 25 -mixer luma image4.png out=75 -mix 25 -mixer luma <...> -serialise test.melt
.melt file format
test.melt
image1.png
out=50
image2.png
out=75
-mix
25
-mixer
luma
image3.png
out=75
-mix
25
-mixer
luma
image3.png
out=75
-mix
25
-mixer
luma
image4.png
out=75
-mix
25
-mixer
luma
<...>
Run
melt test.melt -profile atsc_1080p_60 -consumer avformat:output.mp4 vcodec=libx264 an=1
Additional Notes
There should be an extra return character at the end of the melt file. If there isn't, Exceeded maximum line length (2048) while reading a melt file. will be outputted
Notice that -serialise <name of file>.melt will not be in the .melt file
Melt will actually take some time to load the melt file before the encoding process begins

Related

Can I control the output image quality/size with Puppeteer export to PDF?

When using Puppeteer to print a page as PDF, Puppeteer may convert images in that page to a different format.
For example, printing a JPEG image will result in a PDF with (roughly) the same size as the image. That means Puppeteer is using the same exact JPEG image in the generated PDF. Same happens with other formats like PNG and SVG (the output size matches the size of the original images).
However, printing a WebP image will result in a PDF with a much bigger size (10x more that expected). This seems to be because Puppeteer is converting the WebP image into a JPEG/PNG image before generating the PDF.
I am guessing this is because WebP is not supported (maybe not even by the PDF standard and that may be the reason Puppeteer converts the WebP image in the first place).
Is there a way to control this image conversion? In particular, is it possible to set the target format (ideally JPEG) and quality (ideally < 100) to try to maintain the output size of the PDF in the same range as the input WebP image size?
It may help you to see at two levels what happens to images when saved as pdf, now understand this is a basic demo thus not real world but just by explanation of considerations.
Upper left we have 5x5 pixels so screen rendering uses a blurring to not show images as "sharp" but upper right a pdf viewer tries to maintain vector sharpness.
so what about different formats, GIF TIF and PNG (middle line) are lossless and behave in roughly similar fashion. All should maintain colour pixel fidelity in a PDF.
However, lower line, Jpeg is lousy at maintaining colour fidelity because it spreads the colours between adjoining pixels, which is "Perfect" for fuzzy text or photographs but not much good for PDF colours.
Ok moving on your focus is input to pdf so what do those look like when stored.
each may be written in many ways but let's focus on the most versatile PNG.
%PDF-1.0
1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj
2 0 obj<</Type/Pages/Count 1/Kids[3 0 R]>>endobj
3 0 obj<</Type/Page/MediaBox[0 0 3.75 3.75]/Rotate 0/Resources<</XObject<</Img3 6 0 R>>>>/Contents 5 0 R/Parent 2 0 R>>endobj
5 0 obj<</Length 34>>
stream
q
3.75 0 0 3.75 0 0 cm
/Img3 Do
Q
endstream
endobj
6 0 obj<</Length 75/Type/XObject/Subtype/Image/Width 5/Height 5/BitsPerComponent 8/SMask 7 0 R/ColorSpace/DeviceRGB>>
stream
ÿ ÿÿÿ   ÿÿÿ ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ ÿÿÿÿ###ÿÿÿ ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ ÿÿÿÿÿÿ ÿÿÿ
endstream
endobj
7 0 obj
<</Length 5/Type/XObject/Subtype/Image/Width 5/Height 5/BitsPerComponent 1/ColorSpace/DeviceGray>>
stream
ÿÿÿÿÿ
endstream
endobj
xref
0 7
0000000000 65536 f
0000000016 00000 n
0000000062 00000 n
0000000114 00000 n
0000000334 00000 n
0000000472 00000 n
0000000555 00000 n
trailer
<</Size 7/Info<</Producer(Me)>>/Root 1 0 R>>
startxref
684
%%EOF
Again, for illustration this is a non-typical stream as it shows the bitmap uncompressed but note the main image is defined by
6 0 obj<</Length 75/Type/XObject/Subtype/Image/Width 5/Height 5/BitsPerComponent 8/SMask 7 0 R/ColorSpace/DeviceRGB>>
So of interest it is 5 pixels wide by 5 pixels high NO hint of how many inches it's just 8 bits R, 8bits G, 8 bits B (again its only 3 colours) the Alpha is in a separate image (image Smask 7) so 3x5x5=75 is the uncompressed storage now we can compress many ways such as "Flate" (similar to say used in a zip file)
that will convert the stream from lots of ÿs into a more compacted form.
Again, there are many encodings so if we wish to keep the pdf as text for editing in a text editor first
/Length 72
/Filter [ /ASCIIHexDecode /FlateDecode ]
>>
stream
789cfbcfc0f0ffff7f8605601244a002b0888383038c892a091582e86500009c
663a28>
endstream
Well that was not much compression down from 75 to 72 !
let's use something better by not using plain text.
6 0 obj
<</Length 36/Type/XObject/Subtype/Image/Width 5/Height 5/BitsPerComponent 8/SMask 7 0 R/ColorSpace/DeviceRGB/Filter/FlateDecode>>
stream
xœû¯ ðÿÿ…`D °ˆƒƒŒ‰*©€P¤   ÞF;è
endstream
Ok much better we halved the storage from 72 down to 36 good its small compact and well formed.
So, what about keeping the jpeg structure ahhhh! that when maintaining its lousy nature needs 730
<</Filter/DCTDecode/Type/XObject/Subtype/Image/BitsPerComponent 8/Width 5/Height 5/ColorSpace/DeviceRGB/Length 730>>
stream
ÿØÿà JFIF ` ` ÿÛ C
ÿÛ C
ÿÀ " ÿÄ
ÿÄ µ } !1AQa "q2‘¡#B±ÁRÑð$3br‚
%&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyzƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖ×ØÙÚáâãäåæçèéêñòóôõö÷øùúÿÄ
ÿÄ µ w !1AQ aq"2B‘¡±Á #3RðbrÑ
$4á%ñ&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz‚ƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖ×ØÙÚâãäåæçèéêòóôõö÷øùúÿÚ ? ëÿ j¯Ú[Á_ >|×õ¯„šgŽ¡ñ–™ý¥mg¨.˜ƒJ
§é€Æ„鮬
´`©þ¬ ㌢Šõiæ¸ì,ëáèUq„*ÖŒRÑ%³I}Ëüõ<ìFŽ*£¯QËšVnÓšWi_E$—ÉÿÙ
endstream
endobj
So this test piece is not real world but may serve to help make decisions over best storage means for different inputs.
My preference is use PNG where possible for charts and document text and use Jpeg only when essential for photos or fuzzy OCR.
taking your offered sample jpeg is necessary but even set quality to high with size reduction from maximal can suffer collateral damage.
However, it's not very noticeable except you zoom in closer than blobby anyway here 4 X zoom
Source 58-59 KB
Slightly reduced 50-51 KB

tesseract didn't get the little labels

I've installed tesseract on my linux environment.
It works when I execute something like
# tesseract myPic.jpg /output
But my pic has some little labels and tesseract didn't see them.
Is an option is available to set a pitch or something like that ?
Example of text labels:
With this pic, tesseract doesn't recognize any value...
But with this pic:
I have the following output:
J8
J7A-J7B P7 \
2
40 50 0 180 190
200
P1 P2 7
110 110
\ l
For example, in this case, the 90 (on top left) is not seen by tesseract...
I think it's just an option to define or somethink like that, no ?
Thx
In order to get accurate results from Tesseract (as well as any OCR engine) you will need to follow some guidelines as can be seen in my answer on this post:
Junk results when using Tesseract OCR and tess-two
Here is the gist of it:
Use a high resolution image (if needed) 300 DPI is minimum
Make sure there is no shadows or bends in the image
If there is any skew, you will need to fix the image in code prior to ocr
Use a dictionary to help get good results
Adjust the text size (12 pt font is ideal)
Binarize the image and use image processing algorithms to remove noise
It is also recommended to spend some time training the OCR engine to receive better results as seen in this link: Training Tesseract
I took the 2 images that you shared and ran some image processing on them using the LEADTOOLS SDK (disclaimer: I am an employee of this company) and was able to get better results than you were getting with the processed images, but since the original images aren't the greatest - it still was not 100%. Here is the code I used to try and fix the images:
//initialize the codecs class
using (RasterCodecs codecs = new RasterCodecs())
{
//load the file
using (RasterImage img = codecs.Load(filename))
{
//Run the image processing sequence starting by resizing the image
double newWidth = (img.Width / (double)img.XResolution) * 300;
double newHeight = (img.Height / (double)img.YResolution) * 300;
SizeCommand sizeCommand = new SizeCommand((int)newWidth, (int)newHeight, RasterSizeFlags.Resample);
sizeCommand.Run(img);
//binarize the image
AutoBinarizeCommand autoBinarize = new AutoBinarizeCommand();
autoBinarize.Run(img);
//change it to 1BPP
ColorResolutionCommand colorResolution = new ColorResolutionCommand();
colorResolution.BitsPerPixel = 1;
colorResolution.Run(img);
//save the image as PNG
codecs.Save(img, outputFile, RasterImageFormat.Png, 0);
}
}
Here are the output images from this process:

Start b clip after mixer transition ends

I'm seeking to mix 2 clips, however, I'd like for clip2 to start after the mixer transition ends, not begins.
Essentially, this should mix clip1 with only clip2's frame 0.
I was wondering if there was a better alternative to my current workaround:
melt \
clip1.mp4 \
clip2.mp4 in=0 out=0 length=300 \
-mix 300 -mixer luma \
clip2.mp4
Perhaps there is something to pause clip2 at frame 0 for 300 frames?
(I'm doing this with 2 .mlt clips, but voiding the audio_index doesn't seem to work on mlt clips, thus I get a small audio jump for 1 frame, so this workaround isn't ideal)
You cannot set audio_index on .mlt virtual clips because audio_index is a property of the avformat producer, but MLT XML is read by the xml producer.
You can use the hold producer to hold a frame and mute audio. It defaults to 25 frames duration; so use out to override it:
melt clip1.mp4 hold:clip2.mp4 frame=0 out=299 -mix 300 -mixer luma clip2.mp4

What does the Tesseract OCR library require of an image to be able to accurately extract text?

I am using the Tesseract library to extract text from images. The language is Vietnamese. I have two images. The first one is from a website. The second is a screenshot taken from the Wordpad program. They are shown in links below:
1
2
The first one has 95% accuracy.
Bán căn hộ tầng 5 khu tập thể Thành công Bắc, DT 28m2, gần chợ ThànhCông,
số
đỏ, chính chủ, giá 800 triệu.LH:A.Châu, 0979622551,0905685336
The second image is much larger but the accuracy is just about 60%.
Bặn căn hộ tầng ậ khu tập thể Ỉhành gông
Băc. llĩ 28 m2. gân chợ ĩllành Bông. sũ Ilỏ.
chính l:lIlì. giá 800 lriệu. l.ll: A.BhâU,
0979622551, 0905685336
What about the second image do I have to fix to get as accurate text as the first one?
As stated by #user898678 in image processing to improve tesseract OCR accuracy ,
the following operations can improve OCR's accuracy :
fix DPI (if needed) 300 DPI is minimum
fix text size (e.g. 12 pt should be ok)
try to fix text lines (deskew and dewarp text)
try to fix illumination of image (e.g. no dark part of image
binarize and de-noise image

Melt: how to extract a frame every second?

If I use:
melt -profile square_ntsc movie.flv -consumer avformat:image%05d.jpg s=60x45
I get so many images as many frames are in movie.
How do I extract an image each second? (the frame rate is known). Same as how do I extract an image every n-th frame.
found the answer: just add a r=1 for consumer, like in:
melt -profile square_pal movie.flv -consumer avformat:image%05d.jpg r=1
to get 2 images each second, write:
melt -profile square_pal movie.flv -consumer avformat:image%05d.jpg r=2
to get images each 2 seconds, write:
melt -profile square_pal movie.flv -consumer avformat:image%05d.jpg r=0.5