Including unicharambigs in the [lang].traineddata file (Tesseract) - ocr

I'm facing a problem in training the Tesseract OCR for Kannada font (Lohit Kannada and Kedage), when it comes to numerals.
For example, 0 is getting recognized as 8 (and ನ as ವ).
I needed help in including the unicharambigs file (the documentation on Github describes the format solely).My output.txt file has not changed,despite including the unicharambigs file.
Suppose [lang] corresponds to kan, will the following command include the unicharambigs file in the kan.traineddata file?
combine_tessdata kan.
Incase it doesn't, I'd appreciate any help regarding how to proceed with the same.

Difficult to answer not knowing which version of tesseract and kan.traineddata you're using.
You can unpack the kan.traineddata to see the version of kan.unicharabigs included in it and then recombine it after editing the file.
see https://github.com/tesseract-ocr/tesseract/blob/master/doc/combine_tessdata.1.asc for command syntax
Use -u option to unpack:
-u .traineddata PATHPREFIX Unpacks the .traineddata using the provided prefix.
Use -o option to overwrite ucharambigs:
-o .traineddata FILE…​: Overwrites the specified components of the .traineddata file with those provided on the command line.
Please note that https://github.com/tesseract-ocr/langdata/blob/master/kan/kan.unicharambigs seems to be a copy of eng.unicharambigs

Related

How to read contents of PC Sampling Continuous output (dat file)?

I am looking into the code samples provided in /usr/local/cuda-11.8/extras/CUPTI/samples/pc_sampling_continuous/. I am trying to get the PC stall reasons (from an executable) without modifying the source code using CUPTI. pc_sampling_continuous seems to be doing exactly the same.
When I am running the command :
./libpc_sampling_continuous.pl --collection-mode 1 --sampling-period 7 --file-name pcsampling.dat --verbose --app "./executable_name", it outputs a dat file.
To read this .dat file, /usr/local/cuda-11.8/extras/CUPTI/samples/pc_sampling_utility/ contains an utility file. This is run with the command:
./pc_sampling_utility --file-name 1_pcsampling.dat and it is able to output in a human readable format.
I have 2 problems regarding this:
It always says lineNumber:0, fileName: ERROR_NO_CUBIN, dirName: , in each line. However it is able to show me the stalls. But without correlating to the line number in the code and SASS (both) , it is of no use.
README file tells that I should be using the cubin file for the source correlation. I am able to generate the cubin file (as cuobjdump -xelf all ./executable_name and renaming it to 1.cubin). But I am not able to understand how to input this cubin file together with the .dat file to pc_sampling_utility.
Any help is appreciated.

Get IFC schema version

Opening an *.ifc file we can find "File_Schema" in the Header, for example:
HEADER;
...
FILE_SCHEMA (('IFC4'));
ENDSEC;
We are downloading IFC stream file and it would be nice to know the file schema version for it.
Is it somehow possible to get this information via DataManagement API?
This is already an old post, but just to mention that for those who download the file before any other operation: once downloaded, the following command can be used (on a Unix-like environment) to get exactly the IFC schema (e.g. "IFC2X3", "IFC4"):
grep "^FILE_SCHEMA" file.ifc | cut -d"'" -f2
Of course this command can be integrated in a program written in Node.js for example (using childProcess.exec), or any other programming language. Note that this is usually faster than streaming the file and searching in it, or even using a language-specific library to "grep" the file, especially for big IFC files.

How to convert a .cov file to html report using bullshtml.exe?

I have a coverage report and now i am trying to generate an html report using bullshtml.exe. The coverage file is at location "c:/temp/bullseye.cov", bullshtml.exe is also at same location "c:/temp/bullshtml.exe" so doing as follows:
c:/temp>bullshtml.exe -f "bullseye.cov" "c:/temp"
So, when i do as above in the command line i get error as
"please provide the html output directory".
If i do only
c:/temp>bullshtml.exe c:/temp
I do get files inside the temp folder but it does not make sense to me as i want to convert bullseye.cov to html using bullshtml.exe. Hence please suggest how to get this done and what am i doing wrong is it the command am using wrong or somethingelse.
You need to add "-v" before specifying the output directory, as below
bullshtml.exe -f "bullseye.cov" -v "c:/temp"

recognize bash file with no extension in phpstorm

I'm using the last version of PHPStorm, which is 7 I think and want to have file support for files using a pattern such as *.extension but those don't have an extension. I tried pattern *, which works, but puts all of my files in bash highlighting.
Does anyone have a solution for that without using the .sh extension?
Edit:
Bash file are recognize with extension .sh and .bash. It's working nicely, but what I want is to set default file type on files with no extension. If I add .* or * in the list of bash file, all my files are recognize like bash file.
Hope it's more clear, sorry for the probable mistake in my English.
It may seem weird - but you can try to actually list the files you're using explicitly reading their names.
Not sure of your use-case, but I needed it for git hooks, and there's not so much names for existing git hooks, so it's not that hard to list those :)
For the reference:
Preferences > Editor > File Types > Bourne Again Shell:

trasform csv to xml file genertaion using xslt programing

I would like to generate xml file from an extisng csv file using xslt.
Can anybody tell me command to use.
I don't knwo the command to convert the file.
Suppose my csv file named :- source.csv
ouput template :- temp.xsl
command:-
msxsl source.csv temp.xsl -o result.xml
Is this the right command or not?
Here is a XSL file to convert CSV to XML: http://andrewjwelch.com/code/xslt/csv/csv-to-xml_v2.html
To run it from the command line, the instructions say to download Saxon and use:
java -cp saxon.jar net.sf.saxon.Transform -o output.xml -it main csv-to-xml.xslt pathToCSV=file:/C:/dev/test.csv
Here are the parts of that command line explained:
java The java executable (programming language in which Saxon is written
-cp saxon.jar saxon.jar contains the XSLT code, -cp stands for "classpath" and tells java how to find it
-o output.xml Where the output should go. That is result.xml in your example.
-it main csv-to-xml.xslt specify the xslt file (csv-to-xml.xslt) and the entry point within it (main)
pathToCSV=file:/C:/dev/test.csv your input csv file (source.csv in your example, but formatted as a url)
I do not have sufficient reputation to comment on Stephen's answer.
the transform as described is highly dependent on the XSL stylesheet which defines a parameter pathToCSV and a template with the identifier of "main"
the command will not work as Stephen has written it for version 9 of the Home Edition; when I attempt to run the command as written I get a response of "Command line option -o requires a value". However, this format of the command works as of the day of this posting:
java -cp saxon9he.jar net.sf.saxon.Transform -o:csvfile.xml -it:main "csv2xml.xsl" pathToCSV="csvfile.csv"
the linked xsl appears to be buggy (probably hasn't been maintained) and will not correctly transform all csv files (e.g. the csv example from Michael Kay's book). However, it is a good example from which to learn.