How to compress ico file on imagemagick or other - html

I created a 32x32 png in photoshop exported as 500bytes.
Converted to ico using
magick convert .\favicon.png favicon.ico
And it became 5kb.
Question?
Is there there a compression flag in imagemagick or anoter way to compress favicon.ico?

I just tried #dan-mašek's suggestion and it definitely works better than ImageMagick.
With the PNGs I was working with, ImageMagick gave me a 5.4K .ico file despite asking for it to use .ico's support for embedded PNGs for the compression (apparently it ignores you for sizes under 256x256) while Pillow got it down to 1.8K.
Here's how I went about crunching down my favicons based on an existing PNG-optimizing shell script I cooked up years ago:
#!/bin/sh
optimize_png() {
for X in "$#"; do
echo "---- Using pngcrush to strip irrelevant chunks from $X ----"
# Because I don't know if OptiPNG considers them all "metadata"
pngcrush -ow -q -rem alla -rem cHRM -rem gAMA -rem iCCP -rem sRGB \
-rem time "$X" | egrep -v '^[ \|]\|'
done
echo "---- Using OptiPNG to optimize delta filters ----"
# ...and strip all "metadata"
optipng -clobber -o7 -zm1-9 -strip all -- "$#" 2>&1 | grep -v "IDAT size ="
echo "---- Using AdvanceCOMP to zopfli-optimize DEFLATE ----"
advpng -z4 "$#"
}
optimize_png 16.png 32.png
python3 << EOF
from PIL import Image
i16 = Image.open('16.png')
i32 = Image.open('32.png')
i32.save('src/favicon.ico', sizes=[(16, 16), (32, 32)], append_images=[i16])
EOF
Just be aware that:
pngcrush and advpng don't take -- as arguments, so you have to prefix ./ onto relative paths which might start with -.
.save in PIL must be called on the largest image so, if you have a dynamic list of images, you probably want something like this:
images.sort(key=lambda x: x.size)
images[-1].save('favicon.ico', sizes=[x.size for x in images], append_images=images)

Related

GhostScript: Cropping multiple area of pdf in single reading?

I need to crop certain [multiple time with different area] area of pdf and save it as image. For that i am using the below commands in my windows system for ghostscript[9.5]:
"C:\Program Files\gs\gs9.50\bin\gswin64c.exe" ^
-dSAFER -dBATCH -dNOPAUSE ^
-dTextAlphaBits=4 -dGraphicsAlphaBits=4 ^
-dDEVICEWIDTHPOINTS=238 -dDEVICEHEIGHTPOINTS=149.5 ^
-dFIXEDMEDIA -r600 -dDownScaleFactor=2 -sDEVICE=jpeg -o out1.jpeg ^
-c "<</PageOffset [ -64.2 40 ]>> setpagedevice" ^
-sPDFPassword=01011977 ^
-f "E:\PDFs\ECC\PDF_AESTRO.pdf"
"C:\Program Files\gs\gs9.50\bin\gswin64c.exe" ^
-dSAFER -dBATCH -dNOPAUSE ^
-dTextAlphaBits=4 -dGraphicsAlphaBits=4 ^
-dDEVICEWIDTHPOINTS=238 -dDEVICEHEIGHTPOINTS=149.5 ^
-dFIXEDMEDIA -r600 -dDownScaleFactor=2 -sDEVICE=jpeg -o out2.jpeg ^
-c "<</PageOffset [ -308.5 40 ]>> setpagedevice" ^
-sPDFPassword=01011977 ^
-f "E:\PDFs\ECC\PDF_AESTRO.pdf"
the above command does their job and crops both part from that pdf.
But this is rendering the same pdf twice for that work and consuming the time and i want to save that time for faster processing as this will be used in firebase funciton for myweb app.
How can i crop both area to image in single reading of pdf?
To be blunt about it; you can't do what you are asking. To get the same file rendered differently you need to render it each different way you want it rendered.
In passing; you no longer need to specify -dSAFER (since the 9,50 release) because that's the default. If you specify -o (instead of -sOutputFile) as you have done, then you don't need -dBATCH or -dNOPAUSE, -o includes both of those.
Personally I wouldn't bother with the GraphicsAlphaBits or TextAlphaBits given that you are rendering to JPEG, but that's a matter of personal preference.

Split large directory into subdirectories

I have a directory with about 2.5 million files and is over 70 GB.
I want to split this into subdirectories, each with 1000 files in them.
Here's the command I've tried using:
i=0; for f in *; do d=dir_$(printf %03d $((i/1000+1))); mkdir -p $d; mv "$f" $d; let i++; done
That command works for me on a small scale, but I can leave it running for hours on this directory and it doesn't seem to do anything.
I'm open for doing this in any way via command line: perl, python, etc. Just whatever way would be the fastest to get this done...
I suspect that if you checked, you'd noticed your program was actually moving the files, albeit really slowly. Launching a program is rather expensive (at least compared to making a system call), and you do so three or four times per file! As such, the following should be much faster:
perl -e'
my $base_dir_qfn = ".";
my $i = 0;
my $dir;
opendir(my $dh, $base_dir_qfn)
or die("Can'\''t open dir \"$base_dir_qfn\": $!\n");
while (defined( my $fn = readdir($dh) )) {
next if $fn =~ /^(?:\.\.?|dir_\d+)\z/;
my $qfn = "$base_dir_qfn/$fn";
if ($i % 1000 == 0) {
$dir_qfn = sprintf("%s/dir_%03d", $base_dir_qfn, int($i/1000)+1);
mkdir($dir_qfn)
or die("Can'\''t make directory \"$dir_qfn\": $!\n");
}
rename($qfn, "$dir_qfn/$fn")
or do {
warn("Can'\''t move \"$qfn\" into \"$dir_qfn\": $!\n");
next;
};
++$i;
}
'
Note: ikegami's helpful Perl-based answer is the way to go - it performs the entire operation in a single process and is therefore much faster than the Bash + standard utilities solution below.
A bash-based solution needs to avoid loops in which external utilities are called order to perform reasonably.
Your own solution calls two external utilities and creates a subshell in each loop iteration, which means that you'll end up creating about 7.5 million processes(!) in total.
The following solution avoids loops, but, given the sheer number of input files, will still take quite a while to complete (you'll end up creating 4 processes for every 1000 input files, i.e., ca. 10,000 processes in total):
printf '%s\0' * | xargs -0 -n 1000 bash -O nullglob -c '
dirs=( dir_*/ )
dir=dir_$(printf %04s $(( 1 + ${#dirs[#]} )))
mkdir "$dir"; mv "$#" "$dir"' -
printf '%s\0' * prints a NUL-separated list of all files in the dir.
Note that since printf is a Bash builtin rather than an external utility, the max. command-line length as reported by getconf ARG_MAX does not apply.
xargs -0 -n 1000 invokes the specified command with chunks of 1000 input filenames.
Note that xargs -0 is nonstandard, but supported on both Linux and BSD/OSX.
Using NUL-separated input robustly passes filenames without fear of inadvertently splitting them into multiple parts, and even works with filenames with embedded newlines (though such filenames are very rare).
bash -O nullglob -c executes the specified command string with option nullglob turned on, which means that a globbing pattern that matches nothing will expand to the empty string.
The command string counts the output directories created so far, so as to determine the name of the next output dir with the next higher index, creates the next output dir, and moves the current batch of (up to) 1000 files there.
if the directory is not under use, I suggest the following
find . -maxdepth 1 -type f | split -l 1000 -d -a 5
this will create n number of files named x00000 - x02500 (just to make sure 5 digits although 4 will work too). You can then move the 1000 files listed in each file to a corresponding directory.
perhaps set -o noclobber to eliminate risk of overrides in case of name clash.
to move the files, it's easier to use brace expansion to iterate over file names
for c in x{00000..02500};
do d="d$c";
mkdir $d;
cat $c | xargs -I f mv f $d;
done
Moving files around is always a challenge. IMHO all the solutions presented so far have some risk of destroying your files. This may be because the challenge sounds simple, but there is a lot to consider and to test when implementing it.
We must also not underestimate the efficiency of the solution as we are potentially handling a (very) large number of files.
Here is script carefully & intensively tested with own files. But of course use at your own risk!
This solution:
is safe with filenames that contain spaces.
does not use xargs -L because this will easily result in "Argument list too long" errors
is based on Bash 4 and does not depend on awk, sed, tr etc.
is scaling well with the amount of files to move.
Here is the code:
if [[ "${BASH_VERSINFO[0]}" -lt 4 ]]; then
echo "$(basename "$0") requires Bash 4+"
exit -1
fi >&2
opt_dir=${1:-.}
opt_max=1000
readarray files <<< "$(find "$opt_dir" -maxdepth 1 -mindepth 1 -type f)"
moved=0 dirnum=0 dirname=''
for ((i=0; i < ${#files[#]}; ++i))
do
if [[ $((i % opt_max)) == 0 ]]; then
((dirnum++))
dirname="$opt_dir/$(printf "%02d" $dirnum)"
fi
# chops the LF printed by "find"
file=${files[$i]::-1}
if [[ -n $file ]]; then
[[ -d $dirname ]] || mkdir -v "$dirname" || exit
mv "$file" "$dirname" || exit
((moved++))
fi
done
echo "moved $moved file(s)"
For example, save this as split_directory.sh. Now let's assume you have 2001 files in some/dir:
$ split_directory.sh some/dir
mkdir: created directory some/dir/01
mkdir: created directory some/dir/02
mkdir: created directory some/dir/03
moved 2001 file(s)
Now the new reality looks like this:
some/dir contains 3 directories and 0 files
some/dir/01 contains 1000 files
some/dir/02 contains 1000 files
some/dir/03 contains 1 file
Calling the script again on the same directory is safe and returns almost immediately:
$ split_directory.sh some/dir
moved 0 file(s)
Finally, let's take a look at the special case where we call the script on one of the generated directories:
$ time split_directory.sh some/dir/01
mkdir: created directory 'some/dir/01/01'
moved 1000 file(s)
real 0m19.265s
user 0m4.462s
sys 0m11.184s
$ time split_directory.sh some/dir/01
moved 0 file(s)
real 0m0.140s
user 0m0.015s
sys 0m0.123s
Note that this test ran on a fairly slow, veteran computer.
Good luck :-)
This is probably slower than a Perl program (1 minute for 10.000 files) but it should work with any POSIX compliant shell.
#! /bin/sh
nd=0
nf=0
/bin/ls | \
while read file;
do
case $(expr $nf % 10) in
0)
nd=$(/usr/bin/expr $nd + 1)
dir=$(printf "dir_%04d" $nd)
mkdir $dir
;;
esac
mv "$file" "$dir/$file"
nf=$(/usr/bin/expr $nf + 1)
done
With bash, you can use arithmetic expansion $((...)).
And of course this idea can be improved by using xargs - should not take longer than ~ 45 sec for 2.5 million files.
nd=0
ls | xargs -L 1000 echo | \
while read cmd;
do
nd=$((nd+1))
dir=$(printf "dir_%04d" $nd)
mkdir $dir
mv $cmd $dir
done
I would use the following from the command line:
find . -maxdepth 1 -type f |split -l 1000
for i in `ls x*`
do
mkdir dir$i
mv `cat $i` dir$i& 2>/dev/null
done
Key is the "&" which threads out each mv statement.
Thanks to karakfa for the split idea.

Tesseract training archieves a worse result than without training

I used the Windows installer of tesseract-ocr 3.02.02 (I didn't find a newer one for 3.04). My image is a JPEG with a resolution of 600dpi (3507x4960) which is a scanned blank "certificate of incapacity for work". The OCR result without training is much more accurate than after training. So what am I doing wrong?
This way I build my box file:
SET LANG=arbeitsunfaehigkeit
SET FONTNAME=hausarzt
SET TESSLANG=%LANG%.%FONTNAME%.exp0
tesseract %TESSLANG%.jpg %TESSLANG% -l deu batch.nochop makebox
Using jTessBoxEditor I fixed every box by hand. Then I started the training:
SET LANG=arbeitsunfaehigkeit
SET FONTNAME=hausarzt
SET TESSLANG=%LANG%.%FONTNAME%.exp0
tesseract %TESSLANG%.jpg %TESSLANG% -l deu nobatch box.train
unicharset_extractor %TESSLANG%.box
shapeclustering -F font_properties -U unicharset %TESSLANG%.tr
mftraining -F font_properties -U unicharset -O %LANG%.unicharset %TESSLANG%.tr
cntraining %TESSLANG%.tr
MOVE inttemp %LANG%.inttemp
MOVE normproto %LANG%.normproto
MOVE pffmtable %LANG%.pffmtable
MOVE shapetable %LANG%.shapetable
combine_tessdata %LANG%.
COPY %LANG%.traineddata %TESSERACT_HOME%\tessdata /Y
The OCR without training (archieving the best results) is done like:
SET LANG=arbeitsunfaehigkeit
SET FONTNAME=hausarzt
SET TESSLANG=%LANG%.%FONTNAME%.exp0
tesseract %TESSLANG%.jpg without_training -l deu
Using the traineddata:
SET LANG=arbeitsunfaehigkeit
SET FONTNAME=hausarzt
SET TESSLANG=%LANG%.%FONTNAME%.exp0
tesseract %TESSLANG%.jpg with_training -l %LANG%
Maybe I am wrong but I expect a perfect result (I use the same JPEG for training and OCRing).
Here the first part of without_training.txt:
Paul Albrechts Verlag, 22952 Lütjensee Bei verspäteter Vorlage droht Krankengeldverlust!
Krankenkasse bzw. Kostenträger
Name, Vorname des Versicherten
geb. am
Kassen—Nr. Versicherten—Nr. Status
Betriebsstätten-Nr. Arzt—Nr. Datum
And the first part of with_training.txt:
Pau/A/brechrs Ver/ag, 22952 Lüfjensee Be! verspäteter vor!age droht Krankenge!dver!ust!
Krankenkasse bzw. Kostenträger
Name, Vorname des Versicherten
geb. am
Kassen-Nr. Versicherten-Nr. status
Betriebsstätten-Nr. Arzt-Nr. Datum
In my case adding the language "deu" did the trick:
tesseract %TESSLANG%.jpg with_training -l %LANG%+deu
instead of
tesseract %TESSLANG%.jpg with_training -l %LANG%

How to rename a lot of gibberish image names at once?

I recently downloaded plenty(33000) pictures from a server, which hosts a website that I run. Many of the pictures have gibberish naming, such as "Ч‘ЧђЧ ЧЁ-280x150.jpg".
These names were generally suppose to be in Hebrew but when I downloaded them from the server, their names became gibberish. I could of course just go over all the images and rename them using some gibberish translator, but I can't because there are thousands of images.
So I'm looking for a way to convert all the images with bad naming to images in Hebrew.
I don't have my gibberish-to-Hebrew translator with me, but this will give your images a number instead of a name...
#!/bin/bash
i=1
for f in *.jpg
do
newname=$(printf "%06d" $i)
echo mv "$f" "${newname}.jpg"
((i++))
done
Sample output:
mv 1500x1000.jpg 000001.jpg
mv 3000x2000.jpg 000002.jpg
mv a.jpg 000003.jpg
mv green.jpg 000004.jpg
mv new.jpg 000005.jpg
mv red.jpg 000006.jpg
Remove the word echo if you like the results.

How to Convert Regex Pattern Match to Lowercase for URL Standardization/Tidying

I am currently trying to convert all links and files and tags on my site from UPPERCASE.ext and CamelCase.ext to lowercase.ext.
I can match the links in pages using a regular expression match for href="[^"]*" and src="[^"]*"
This seems to work fine for identifying the link and images in the HTML.
However what I need to do with this is to take the match and run a ToLowercase() function on the matches. Since I have a lot of pages that I'd like to parse through, I'm looking to make a short shell script that will run on a specified directory and pattern match the specified regexes and perform a lowercase operation on them.
Perl one-liner to rename all regular files to lowercase:
perl -le 'use File::Find; find({wanted=>sub{-f && rename($_, lc)}}, "/path/to/files");'
If you want to be more specific about what files are renamed you could change -f to a regex or something:
perl -le 'use File::Find; find({wanted=>sub{/\.(txt|htm|blah)$/i && rename($_, lc)}}, "/path/to/files");'
EDIT: Sorry, after rereading the question I see you also want to replace occurrences within files as well:
find /path/to/files -name "*.html" -exec perl -pi -e 's/\b(src|href)="(.+)"/$1="\L$2"/gi;' {} \;
EDIT 2: Try this one as the find command uses + instead of \; which is more efficient since multiple files are passed to perl at once (thanks to #ikegami from another post). It also It also handles both ' and " around the URL. Finally, it uses {} instead of // for substitutions since you are substituting URLs (maybe the /s in the URL are confusing perl or your shell?). It shouldn't matter, and I tried both on my system with the same effect (both worked fine), but it's worth a shot:
find . -name "*.html" -exec perl -pi -e \
'$q=qr/"|\x39/; s{\b(src|href)=($q?.+$q?)\b}{$1=\L$2}gi;' {} +
PS: I also have a Macbook and tested these using bash shell with Perl versions 5.8.9 and 5.10.0.
With bash, you can declare a variable to only hold lower case values:
declare -l varname
read varname <<< "This Is LOWERCASE"
echo $varname # ==> this is lowercase
Or, you can convert a value to lowercase (bash version 4, I think)
x="This Is LOWERCASE"
echo ${x,,} # ==> this is lowercase
you want this?
kent$ echo "aBcDEF"|sed 's/.*/\L&/g'
abcdef
or this
kent$ echo "aBcDEF"|awk '$0=tolower($0)'
abcdef
with your own regex:
kent$ echo 'FOO src="htTP://wWw.GOOGLE.CoM" BAR BlahBlah'|sed -r 's/src="[^"]*"/\L&/g'
FOO src="http://www.google.com" BAR BlahBlah
You could use sed with -i (in-place edit):
sed -i'' -re's/(href|src)="[^"]*"/\L&/g' /path/to/files/*