GhostScript: Cropping multiple area of pdf in single reading? - google-cloud-functions

I need to crop certain [multiple time with different area] area of pdf and save it as image. For that i am using the below commands in my windows system for ghostscript[9.5]:
"C:\Program Files\gs\gs9.50\bin\gswin64c.exe" ^
-dSAFER -dBATCH -dNOPAUSE ^
-dTextAlphaBits=4 -dGraphicsAlphaBits=4 ^
-dDEVICEWIDTHPOINTS=238 -dDEVICEHEIGHTPOINTS=149.5 ^
-dFIXEDMEDIA -r600 -dDownScaleFactor=2 -sDEVICE=jpeg -o out1.jpeg ^
-c "<</PageOffset [ -64.2 40 ]>> setpagedevice" ^
-sPDFPassword=01011977 ^
-f "E:\PDFs\ECC\PDF_AESTRO.pdf"
"C:\Program Files\gs\gs9.50\bin\gswin64c.exe" ^
-dSAFER -dBATCH -dNOPAUSE ^
-dTextAlphaBits=4 -dGraphicsAlphaBits=4 ^
-dDEVICEWIDTHPOINTS=238 -dDEVICEHEIGHTPOINTS=149.5 ^
-dFIXEDMEDIA -r600 -dDownScaleFactor=2 -sDEVICE=jpeg -o out2.jpeg ^
-c "<</PageOffset [ -308.5 40 ]>> setpagedevice" ^
-sPDFPassword=01011977 ^
-f "E:\PDFs\ECC\PDF_AESTRO.pdf"
the above command does their job and crops both part from that pdf.
But this is rendering the same pdf twice for that work and consuming the time and i want to save that time for faster processing as this will be used in firebase funciton for myweb app.
How can i crop both area to image in single reading of pdf?

To be blunt about it; you can't do what you are asking. To get the same file rendered differently you need to render it each different way you want it rendered.
In passing; you no longer need to specify -dSAFER (since the 9,50 release) because that's the default. If you specify -o (instead of -sOutputFile) as you have done, then you don't need -dBATCH or -dNOPAUSE, -o includes both of those.
Personally I wouldn't bother with the GraphicsAlphaBits or TextAlphaBits given that you are rendering to JPEG, but that's a matter of personal preference.

Related

Failed in generating Tesseract traineddata

I'm using Tesseract v5.0.1.20220118 on Windows 10, training a font only have letter "P" and "Q".
When I get to the step
mftraining -F font_properties.txt -U unicharset -O normal.unicharset pq.normal.exp0.tr
The pffmtable file is not generated.
And when I run code cntraining pq.normal.exp0.tr
It shows me
Reading pq.normal.exp0.tr ...
Clustering ...
N == sizeof(Cluster->Mean):Error:Assert failed:in file ../../../src/classify/cluster.cpp, line 2526
Why it goes wrong? How can I fix it?
I only have inttemp and shapetable generated, but the tutorial says there will be four files include shapetable, inttemp, pffmtable and normproto, I wonder that maybe is beacuse of the font only have letter "P" and "Q", but I have no idea how to solve it.
Please read the docs:
https://tesseract-ocr.github.io/tessdoc/#training-for-tesseract-5
Use the right tools:
https://github.com/tesseract-ocr/tesstrain

How to compress ico file on imagemagick or other

I created a 32x32 png in photoshop exported as 500bytes.
Converted to ico using
magick convert .\favicon.png favicon.ico
And it became 5kb.
Question?
Is there there a compression flag in imagemagick or anoter way to compress favicon.ico?
I just tried #dan-maĊĦek's suggestion and it definitely works better than ImageMagick.
With the PNGs I was working with, ImageMagick gave me a 5.4K .ico file despite asking for it to use .ico's support for embedded PNGs for the compression (apparently it ignores you for sizes under 256x256) while Pillow got it down to 1.8K.
Here's how I went about crunching down my favicons based on an existing PNG-optimizing shell script I cooked up years ago:
#!/bin/sh
optimize_png() {
for X in "$#"; do
echo "---- Using pngcrush to strip irrelevant chunks from $X ----"
# Because I don't know if OptiPNG considers them all "metadata"
pngcrush -ow -q -rem alla -rem cHRM -rem gAMA -rem iCCP -rem sRGB \
-rem time "$X" | egrep -v '^[ \|]\|'
done
echo "---- Using OptiPNG to optimize delta filters ----"
# ...and strip all "metadata"
optipng -clobber -o7 -zm1-9 -strip all -- "$#" 2>&1 | grep -v "IDAT size ="
echo "---- Using AdvanceCOMP to zopfli-optimize DEFLATE ----"
advpng -z4 "$#"
}
optimize_png 16.png 32.png
python3 << EOF
from PIL import Image
i16 = Image.open('16.png')
i32 = Image.open('32.png')
i32.save('src/favicon.ico', sizes=[(16, 16), (32, 32)], append_images=[i16])
EOF
Just be aware that:
pngcrush and advpng don't take -- as arguments, so you have to prefix ./ onto relative paths which might start with -.
.save in PIL must be called on the largest image so, if you have a dynamic list of images, you probably want something like this:
images.sort(key=lambda x: x.size)
images[-1].save('favicon.ico', sizes=[x.size for x in images], append_images=images)

hard-coded output without expansion in Snakefile

I have Snakefile as following:
SAMPLES, = glob_wildcards("data/{sample}_R1.fq.gz")
rule all:
input:
expand("samtools_sorted_out/{sample}.raw.snps.indels.g.vcf", sample=SAMPLES),
expand("samtools_sorted_out/combined_gvcf")
rule combine_gvcf:
input: "samtools_sorted_out/{sample}.raw.snps.indels.g.vcf"
output:directory("samtools_sorted_out/combined_gvcf")
params: gvcf_file_list="gvcf_files.list",
gatk4="/storage/anaconda3/envs/exome/share/gatk4-4.1.0.0-0/gatk-package-4.1.0.0-local.jar"
shell:"""
java -DGATK_STACKTRACE_ON_USER_EXCEPTION=true \
-jar {params.gatk4} GenomicsDBImport \
-V {params.gvcf_file_list} \
--genomicsdb-workspace-path {output}
"""
When I test it with dry run, I got error:
RuleException in line 335 of /data/yifangt/exomecapture/Snakefile:
Wildcards in input, params, log or benchmark file of rule combine_gvcf cannot be determined from output files:
'sample'
There are two places that I need some help:
The {output} is a folder that will be created by the shell part;
The {output} folder was hard-coded manually required by the command line (and the contents are unknown ahead of time).
The problem seems to be with the {output} without expansion as compared with the {input} which does.
How should I handle with this situation? Thanks a lot!

Parsing JSON with BusyBox tools

I'm working on a blog theme for Hugo installable on Android (BusyBox via Termux) and plan to create a BusyBox Docker image and copy my theme and the hugo binary to it for use on ARM.
Theme releases are archived and made available on NPM and the tools available on BusyBox have allowed me to reliably parse version from the metadata from JSON:
meta=$(wget -qO - https://registry.npmjs.org/package/latest)
vers=$(echo "$meta" | egrep -o "\"version\".*[^,]*," | cut -d ',' -f1 | cut -d ':' -f2 | tr -d '" ')
Now I would like to copy the dist value from the meta into a text file for use in Hugo:
"dist": {
"integrity": "sha512-3MH2/UKYPjr+CTC85hWGg/N3GZmSlgBWXzdXHroDfJRnEmcBKkvt1oiadN8gzCCppqCQhwtmengZzg0imm1mtg==",
"shasum": "a159699b1c5fb006a84457fcdf0eb98d72c2eb75",
"tarball": "https://registry.npmjs.org/after-dark/-/after-dark-6.4.1.tgz",
"fileCount": 98,
"unpackedSize": 5338189
},
Above pretty-printed for clarity. The actual metadata is compressed.
Is there a way I can reuse the version parsing logic above to also pull the dist field value?
Proper robust parsing requires tools like jq where it could be as simple as jq '.version' ip.txt and jq '.dist' ip.txt
You could use sed but use it at your own risk
$ sed -n 's/.*"version":"\([^"]*\).*/\1/p' ip.txt
6.4.1
$ sed -n 's/.*\("dist":{[^}]*}\).*/\1/p' ip.txt
"dist":{"integrity":....
....}
-n option to disable automatic printing
the p modifier with s command will allow to print only when substitution succeeds, this will mean output is empty instead of entire input line when something goes wrong
.*"version":"\([^"]*\).* this will match entire line, capturing data between double quotes after version tag - you'll have to adjust the regex if whitespaces are allowed and other valid json formats
.*\("dist":{[^}]*}\).* this will match entire line, capturing data starting with "dist":{ and first occurrence of } afterwards - so this is not suited if the tag itself can contain }

Find function's start offset in ELF

Suppose I have function fn somewhere within the .text section of an ELF64 executable. Is there a way to know at which offset (in bytes) from the start of the ELF file the fn function is located? Note that I don't need to know at which VA it was relocated at linking time, but its position within the ELF file.
Generally yes, if you can parse the ELF file directly or combine output from tools like objdump and readelf.
More specific: You can get the offset and virtual address of your .text section with 'readelf -S file' - write those down.
Further you can list symbols with 'readelf -s file', as long your executable is not stripped, and your function is visible (not static or in an anonymous namespace) then you should find your function and the virtual address of it.
Thus you can calculate the offset via
fn symbol offset = fn symbol VA - .text VA + .text offset
Thats assuming you want to do it "offline" with common tools. Its more difficult if you dont have access to the unstripped ELF file, and since only a part of the ELF File remains in memory, probably not possible without adding some information with "offline" tricks.
simply use objdump -F option
user#phoenix-amd64:~$ objdump -D -F /opt/phoenix/i486/heap-xxx -D | grep main
08048630 <__libc_start_main#plt> (File Offset: 0x630):
8048679: e8 b2 ff ff ff call 8048630 <__libc_start_main#plt> (File
Offset: 0x630)
080487d5 <main> (File Offset: 0x7d5):
The answer by Norbert Lange works for the functions that are listed in the symbol table of the ELF file. But static functions will not be present there, so even if e.g. GDB could find them (by using DWARF debug info), readelf -s won't.
In this case, you can use GDB. For example, let's find the offset of xfce_displays_helper_normalize_crtc in /usr/bin/xfsettingsd (that was my actual use case, thus this obscure choice of an example).
$ gdb -q -ex 'p &xfce_displays_helper_normalize_crtc' -ex q xfsettingsd
Reading symbols from xfsettingsd...
Reading symbols from /usr/lib/debug/.build-id/b2/2ad9713642253d4d7a6f94acf0174ccfe3d487.debug...
$1 = (void (*)(XfceRRCrtc *, XfceDisplaysHelper *)) 0x11e80 <xfce_displays_helper_normalize_crtc>
Note that here we only load the file with GDB, don't let it start. And then use p command (print in full form) to get the address. So in my case, the function is at offset 0x11e80.
In some cases GDB will resolve the offset to virtual address even before we start or starti the program. This happens, in particular, on x86-32. In this case we can simply subtract the virtual address of the file image, given by readelf -l:
$ readelf -l /bin/sleep | grep ' VirtAddr \|\<LOAD *0x[0-9a-f]\+\>'
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x08048000 0x08048000 0x05230 0x05230 R E 0x1000
In the example above, the virtual address of the file image is 0x8048000, which would have to be subtracted from virtual address of the function if GDB happens to output it instead of the offset.