How to insert a TIFF into PostScript? - tiff

I would like to include a TIFF image into my PostScript similarily to an EPS and a JPEG. But it fails on creation stating the file ImageType isn't JPEG.
Is this possible?

You can't use a TIFF directly in PostScript, as PostScript doesn't support the TIFF file format. You can use a PostScript program to read a TIFF file and process it as an image, for example :
Conversion of TIFF to PDF with Ghostscript

Related

Convert and parse digitally created PDF file to tesseract tsv file directly from PDF source code

I am trying to convert and parse a digitally created PDF file to tesseract tsv file. And to use this as ground truth for testing tesseract OCR performance on my PDF to TIF to TSV pipeline.
Any idea how I can achieve this task --- convert and parse a digitally created PDF file to a tesseract tsv file, without OCR or anything?
So far, I can use packages such as fitz PyMuPDF, pdfminer to extract texts. Using fitz PyMuPDF can give me sentences with their box location the PDF file (x_1,y_1,x_2,y_2). However, I don't see anyway yet to parse to tesseract-like tsv outputs. See https://blog.tomrochette.com/tesseract-tsv-format.
Any advice would be greatly appreciated. Thanks! :)

How to convert HTML file into Framemaker Interchange Format(.mif) file?

I want to mark index and cross-references like Framemaker does.
Framemaker can export the .fm into .htm and .mif file.
I have analyzed how the index and cross-references appears in .htm and .mif file after exporting it from framemaker.
Now my system will produces .htm file and I can manage to mark the index and cross-reference like framemaker does.
I want that framemaker retain the index and cross-references which will be marked by my system.
But there is no way to import or open HTML files directly in Framemaker.
We can import .mif file in framemaker.
So is there any way we can convert HTML files into .mif(FrameMaker Interchange Format).
there is one option, I know its not full proof solution for this problem.
but it can save your efforts to some point,
Save the HTML file to RTF format (using MS word/Open Office)
Open that RTF file in FM
FM accepts the RTF file and convert it into .fm file
Save the .fm file into .mif format
Note : in this conversion, some data loss may happen, i have tried using it for Markers it works but not complete solution.
All the best!!
You can open the .htm files in Structured FrameMaker and then save them to .mif. This will produce less loss in graphics, for sure.

Dataloss when saving images binary data 'as file'

I'm kinda a programming noobie but here it goes:
I opened an image file with the program binaryviewer (http://www.proxoft.com/BinaryViewer.aspx) to see its binary code.
Then I used its copy function to first copy the binary data as a .txt file, then as a .jpeg file. The resulting files are quite smaller than the original image file and are completely not readable as images.
Why are the resulting images so much smaller? What kind of data is getting lost in this process and are there ways to prevent that?
Are there specific ways to recreate the image of a file containing only the 0s and 1s of a original image file?
Whatever binary viewer you are using, it just looks at the raw bytes as stored in the file on the disk.
1) When saving 'as text' is itself determines in which format it writes the binary information to a text file. You should look that up in its documentation.
2) It is very unlikely that it has knowledge about the structure of jpg files. So again, when you save to a .jpg file, it itself chooses how to output the bytes, dumps them to a file named .jpg, but it does not have the on-disk structure of a .jpg. For any image viewer trying to read the file, it's just garbage.
But as I said in my comments, without knowing what 'binary viewer' you are talking about it's not possible to be more specific.

Multipage with GnuWin32

tiffSplit.exe of GnuWin32 library splits tiff images. Is it possible to convert various single page files into multipage tiff with GnuWin32 library?
The solution is to use tiffcp:
tiffcp -c lzw <source files> <dest file>
the -c option specifies the compression to use. If you don't use any compression then the file can be significantly larger than the input files if they are compressed.

box-api Download a File: how to save raw text response to file with correct encoding

I implemented downloading a file. I received raw string data. I save this data to a file, but can't open it, because file is invalid.
I tried to download jpg image.
I think this is a result of incorrect converting this data to bytes.
What Encoding I need to use?