Our C++ application is using cups to print out a postscript file generated by XRT XrtTblVaDrawPS command. But when I print 2 copies and set the cups collate option the file is not printed out as collated.
Our project is using XRT motif library to generate a postscript file from a table layout using motif. The postscript file generated by XrtTblVaDrawPS was printed using cups but during testing the cups Collate option appeared not to work when we where printing more then 1 copy. Web searches did not return any reason why the ps file was not collating but after a lot of experimentation we found out why cups was not working as expected. The XrtTblVaDrawPS call generating the ps file and one of the option used was "XRTTBL_PS_NUM_COPIES, 2" to set how many copies the postscript file would print out. In our cups class we were doing a cupsAddOption("copies", "2",.. and cupsAddOption("Collate", "True", .. commands (see examples below). It turns out the the cups "copies" command was killing the Collating if it set to 2. Like the orientation postscript/cups conflict you need to set the cups copies value to 1 to get the collation to work. The postscript file already knows its going to print out, for example 2 copies. If you don't want it to be collated then set cups copies to 2 number. If you are generating a postscript file some other way this problem might not be happening to you, but it is if you are using the XrtTblVaDrawPS call.
pgs = XrtTblVaDrawPS(myTable, fp, XRTTBL_PS_NUM_COPIES, num, <= set to 2 XRTTBL_PS_CELL_RANGE, rng,
XRTTBL_PS_COLOR, clr,
XRTTBL_PS_ORIENTATION, ornt,
XRTTBL_PS_SCALE, FIT_TO_PAGE_HEIGHT,
XRTTBL_PS_SHOW_ROW_LABELS, XRTTBL_PS_ALL,
XRTTBL_PS_SHOW_FROZEN_ROWS, XRTTBL_PS_ALL,
XRTTBL_PS_SHOW_COL_LABELS, XRTTBL_PS_ALL,
XRTTBL_PS_SHOW_FROZEN_COLS, XRTTBL_PS_ALL,
XRTTBL_PS_PAPERSIZE_WIDTH, media_sz.width,
XRTTBL_PS_PAPERSIZE_HEIGHT, media_sz.length,
XRTTBL_PS_MARGIN_LEFT, 1.00,
XRTTBL_PS_MARGIN_RIGHT, 1.00,
XRTTBL_PS_MARGIN_TOP, 0.75,
XRTTBL_PS_MARGIN_BOTTOM, 0.75,
XRTTBL_PS_HEADER_FONT, "Adobe 10",
XRTTBL_PS_HEADER, hdr,
XRTTBL_PS_HEADER_MARGIN, 0.55,
XRTTBL_PS_FOOTER_FONT, "Adobe 10",
XRTTBL_PS_FOOTER, "Page #",
XRTTBL_PS_FOOTER_MARGIN, 0.25, NULL);
myNumOptions = cupsAddOption("Collate", "True", myNumOptions, &myOptions);
myNumOptions = cupsAddOption("copies", oss.str().c_str(), myNumOptions, &myOptions);
oss.str().c_str() is "2" and collate fails and I get (1-1-2-2)
oss.str().c_str() is "1" and collate works and I get (1-2-1-2)
oss.str().c_str() is "2" and cups Collate set to "False" I get (1-1-2-2) as expected
Related
I have the image created from old fax document (the font is specific) Generally Tesseract works pretty ok with this input, except one use case. When the line starts with many leading asterisk '*' than it is ignored.
The result produces by ocr is different for given psm
psm 1: the empty page
psm 6: For queries please contact NA, KRKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KKK KK KK KK
In every use case the tittle "comment" is skipped
But when I manually in Paint removed the all '*' from image then the ocr works fine. I ve no idea how to process the ocr without image preprocessing. Can someone understand it?
Try this: tesseract 9UIKs.png - --psm 4 --oem 0
Which produces:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxkkkk COMMENT kkkkxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
For queries p1ease contact NA.
XXXXXKKKKXXXVXKKKKKXXXXXKXXXXXXXXXXXXKXXXXXXXXXXXXXXXXKKXXXXXXXXXXXXXXXXXKXXXX.
You will need language model with support for legacy engine (from here https://github.com/tesseract-ocr/tessdata)
I am trying to train Tesseract 4 with particular pictures (to read multimeters with 7 segments),
please note that I am aware of the allready trained data from Arthur Augusto at https://github.com/arturaugusto/display_ocr but I need to train Tesseract over my own data.
In order to train tess, I followed differents tutorials (as https://robipritrznik.medium.com/recognizing-vehicle-license-plates-on-images-using-tesseract-4-ocr-with-custom-trained-models-4ba9861595e7 or https://pretius.com/how-to-prepare-training-files-for-tesseract-ocr-and-improve-characters-recognition/)
but i allways get problem when running the shapeclustering command with my own data
(With example data as https://github.com/tesseract-ocr/tesseract/issues/1174#issuecomment-338448972, every things is working fine)
Indeed when I try to do the shapeclusturing command it have this output screenshot
Then my shape_table is empty and the trainig could'nt be efficient...
With example data it's working fine and the shape_table is well filled
I am guessing that I have issue with box file generation, here is my process to create box file :
I use the
tesseract imageFileName.tif imageFileName batch.nochop makebox
command to generate box file and then i edit it with JtessboxEditor.
So I can't see where I'am wrong with my .box/.tif data couple.
Have a good day & thanks for helping me
\n
Adrien
Here is my full batch script for training after having generated and edited box files.
set name=sev7.exp0
set shortName=sev7
echo Run Tesseract for Training..
tesseract.exe %name%.tif %name% nobatch box.train
echo Compute the Character Set..
unicharset_extractor.exe %name%.box
shapeclustering -F font_properties -U unicharset -O %shortName%.unicharset %name%.tr
mftraining -F font_properties -U unicharset -O %shortName%.unicharset %name%.tr
echo Clustering..
cntraining.exe %name%.tr
echo Rename Files..
rename normproto %shortName%.normproto
rename inttemp %shortName%.inttemp
rename pffmtable %shortName%.pffmtable
rename shapetable %shortName%.shapetable
echo Create Tessdata..
combine_tessdata.exe %shortName%.
echo. & pause
Ok so finally I achieved to train tesseract.
The solution is to add a --psm parameter when using the command
tesseract.exe %name%.tif %name% nobatch box.train
as
tesseract.exe %name%.%typeFile% %name% --psm %psm% nobatch box.train
note that all the psm value are :
REM pagesegmode values are:
REM 0 = Orientation and script detection (OSD) only.
REM 1 = Automatic page segmentation with OSD.
REM 2 = Automatic page segmentation, but no OSD, or OCR
REM 3 = Fully automatic page segmentation, but no OSD. (Default)
REM 4 = Assume a single column of text of variable sizes.
REM 5 = Assume a single uniform block of vertically aligned text.
REM 6 = Assume a single uniform block of text.
REM 7 = Treat the image as a single text line.
REM 8 = Treat the image as a single word.
REM 9 = Treat the image as a single word in a circle.
REM 10 = Treat the image as a single character.
REM 11 = Sparse text. Find as much text as possible in no particular order.
REM 12 Sparse text with OSD.
REM 13 Raw line. Treat the image as a single text line bypassing hacks that are Tesseract-specific.
founded on https://github.com/tesseract-ocr/tesseract/issues/434
Such a simple question to have wasted an hour or two of my time. The Octave docs allude to setting the interpreter to tex and never say how to do it. I've looked on line and through stackoverflow and haven't found how to do this. I've also looked at the .octaverc files and have seen nothing that would indicate how to turn on the tex edit function. I am using Debian GNUOctave version 4.0.0. Please help.
Gary Roach
The interpreter property is set to "tex" per default for axes, line, text, patch and surface. So changing interpreter makes only sense if you want to switch to "none":
set (findobj (gcf, "-property", "interpreter"), "interpreter", "none")
This sets "interpreter"="none" for al children of the current figure.
If you want to have some fancy latex stuff in your plots and not only simple tex commands you can render it with latex:
close all
graphics_toolkit fltk
sombrero ();
title ("The sombrero function:")
fcn = "$z = \\frac{\\sin\\left(\\sqrt{x^2 + y^2}\\right)}{\\sqrt{x^2 + y^2}}$";
text (0.5, -10, 1.8, fcn, "fontsize", 20);
print -depslatexstandalone sombrero
## process generated files with pdflatex
system ("latex sombrero.tex");
## dvi to ps
system ("dvips sombrero.dvi");
## convert to png
system ("gs -dNOPAUSE -dBATCH -dSAFER -sDEVICE=png16m -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r100x100 -dEPSCrop -sOutputFile=sombrero.png sombrero.ps")
which gives:
I'm trying to convert a HTML file to a PDF by using the Mac terminal.
I found a similar post and I did use the code they provided. But I kept getting nothing. I did not find the output file anywhere when I issued this command:
./soffice --headless --convert-to pdf --outdir /home/user ~/Downloads/*.odt
I'm using Mac OS X 10.8.5.
Can someone show me a terminal command line that I can use to convert HTML to PDF?
I'm trying to convert a HTML file to a PDF by using the Mac terminal.
Ok, here is an alternative way to do convert (X)HTML to PDF on a Mac command line. It does not use LibreOffice at all and should work on all Macs.
This method (ab)uses a filter from the Mac's print subsystem, called xhtmltopdf. This filter is usually not meant to be used by end-users but only by the CUPS printing system.
However, if you know about it, know where to find it and know how to run it, there is no problem with doing so:
The first thing to know is that it is not in any desktop user's $PATH. It is in /usr/libexec/cups/filter/xhtmltopdf.
The second thing to know is that it requires a specific syntax and order of parameters to run, otherwise it won't. Calling it with no parameters at all (or with the wrong number of parameters) it will emit a small usage hint:
$ /usr/libexec/cups/filter/xhtmltopdf
Usage: xhtmltopdf job-id user title copies options [file]
Most of these parameter names show that the tool clearly related to printing. The command requires in total at least 5, or an optional 6th parameter. If only 5 parameters are given, it reads its input from <stdin>, otherwise from the 6ths parameter, a file name. It always emits its output to <stdout>.
The only CLI params which are interesting to us are number 5 (the "options") and the (optional) number 6 (the input file name).
When we run it on the command line, we have to supply 5 dummy or empty parameters first, before we can put the input file's name. We also have to redirect the output to a PDF file.
So, let's try it:
/usr/libexec/cups/filter/xhtmltopdf "" "" "" "" "" my.html > my.pdf
Or, alternatively (this is faster to type and easier to check for completeness, using 5 dummy parameters instead of 5 empty ones):
/usr/libexec/cups/filter/xhtmltopdf 1 2 3 4 5 my.html > my.pdf
While we are at it, we could try to apply some other CUPS print subsystem filters on the output: /usr/libexec/cups/filter/cgpdftopdf looks like one that could be interesting. This additional filter expects the same sort of parameter number and orders, like all CUPS filters.
So this should work:
/usr/libexec/cups/filter/xhtmltopdf 1 2 3 4 5 my.html \
| /usr/libexec/cups/filter/cgpdftopdf 1 2 3 4 "" \
> my.pdf
However, piping the output of xhtmltopdf into cgpdftopdf is only interesting if we try to apply some "print options". That is, we need to come up with some settings in parameter no. 5 which achieve something.
Looking up the CUPS command line options on the CUPS web page suggests a few candidates:
-o number-up=4
-o page-border=double-thick
-o number-up-layout=tblr
do look like they could be applied while doing a PDF-to-PDF transformation. Let's try:
/usr/libexec/cups/filter/xhtmltopdfcc 1 2 3 4 5 my.html \
| /usr/libexec/cups/filter/cgpdftopdf 1 2 3 4 5 \
"number-up=4 page-border=double-thick number-up-layout=tblr" \
> my.pdf
Here are two screenshots of results I achieved with this method. Both used as input files two HTML files which were identical, apart from one line: it was the line which referenced a CSS file to be used for rendering the HTML.
As you can see, the xhtmltopdf filter is able to (at least partially) take into account CSS settings when it converts its input to PDF:
Starting 3.6.0.1 , you would need unoconv on the system to converts documents.
Using unoconv with MacOS X
LibreOffice 3.6.0.1 or later is required to use unoconv under MacOS X. This is the first version distributed with an internal python script that works. No version of OpenOffice for MacOS X (3.4 is the current version) works because the necessary internal files are not included inside the application.
I just had the same problem, but I found this LibreOffice help post. It seems that headless mode won't work if you've got LibreOffice (the usual GUI version) running too. The fix is to add an -env option, e.g.
libreoffice "-env:UserInstallation=file:///tmp/LibO_Conversion" \
--headless \
--invisible \
--convert-to csv file.xls
I have a csv file which has 5 entries on every row. Every entry is whether a network packet is triggered or not. The last entry in every row is the size of packet. Every row = time elapsed in ms.
e.g. row
1 , 0 , 1 , 2 , 117
How do I plot a graph for e.g. where x axis is the row number and y is the value for e.g. 1st entry in every row?
This should get you started:
set datafile separator ","
plot 'infile' using 0:1
You can also plot to a png file using gnuplot (which is free):
terminal commands
gnuplot> set title '<title>'
gnuplot> set ylabel '<yLabel>'
gnuplot> set xlabel '<xLabel>'
gnuplot> set grid
gnuplot> set term png
gnuplot> set output '<Output file name>.png'
gnuplot> plot '<fromfile.csv>'
note: you always need to give the right extension (.png here) at set output
Then it is also possible that the ouput is not lines, because your data is not continues. To fix this simply change the 'plot' line to:
plot '<Fromfile.csv>' with line lt -1 lw 2
More line editing options (dashes and line color ect.) at:
http://gnuplot.sourceforge.net/demo_canvas/dashcolor.html
gnuplot is available in most linux distros via the package manager (e.g. on an apt based distro, run apt-get install gnuplot)
gnuplot is available in windows via Cygwin
gnuplot is available on macOS via homebrew (run brew install gnuplot)