How can I see the 0s and 1s / machine code from a executable file / object file? - binary

I already tried this, I opened a a.out file with a text editor but I get only a bunch of characters with some instructions in it like:
üÙ

Try hexdump. Something like:
$ hexdump -X a.out
It will give you just that: an hexadecimal dump of the file.
Having said that, another possibility might include using GDB's disassemble command.

Lookup your local friendly Hex Editor.

To see the disassembly (with opcode bytes) of the code only, not including any file headers:
objdump -d a.aot

Executable files come in several formats. For Unix/Linux it's ELF:
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
For Windows it's PE:
http://en.wikipedia.org/wiki/Portable_Executable
Use the objdump tools to see the opcodes as others have pointed out

when you open a binary file you would see unreadable characters, the reason is that ascii encoding uses 7 bits , therefore all the characters that have a code point from 128 to 255 won't be recognized by the editor and therefore you see unknown characters.
if you want to see all the contents of a binary file , you could use programs like hexdump,objdump and readelf, for example lets say we want to dissect /bin/bash as a binary file in linux(elf) into its bytes in hexadecimal representation, so we say:
hexdump /bin/bash
but the best tool for these kind of files is: readelf
if we want to see all the contents of a binary file in a more human-readable format than the hexdump output we could just say:
readelf -a /bin/bash
and we would see all different sections of the binary file (elf header, program header , sections header and data).
also using other flags we could see only one header at a time, or we could just disassemble the .text sections in the file and so on.

Related

How to read contents of PC Sampling Continuous output (dat file)?

I am looking into the code samples provided in /usr/local/cuda-11.8/extras/CUPTI/samples/pc_sampling_continuous/. I am trying to get the PC stall reasons (from an executable) without modifying the source code using CUPTI. pc_sampling_continuous seems to be doing exactly the same.
When I am running the command :
./libpc_sampling_continuous.pl --collection-mode 1 --sampling-period 7 --file-name pcsampling.dat --verbose --app "./executable_name", it outputs a dat file.
To read this .dat file, /usr/local/cuda-11.8/extras/CUPTI/samples/pc_sampling_utility/ contains an utility file. This is run with the command:
./pc_sampling_utility --file-name 1_pcsampling.dat and it is able to output in a human readable format.
I have 2 problems regarding this:
It always says lineNumber:0, fileName: ERROR_NO_CUBIN, dirName: , in each line. However it is able to show me the stalls. But without correlating to the line number in the code and SASS (both) , it is of no use.
README file tells that I should be using the cubin file for the source correlation. I am able to generate the cubin file (as cuobjdump -xelf all ./executable_name and renaming it to 1.cubin). But I am not able to understand how to input this cubin file together with the .dat file to pc_sampling_utility.
Any help is appreciated.

Including unicharambigs in the [lang].traineddata file (Tesseract)

I'm facing a problem in training the Tesseract OCR for Kannada font (Lohit Kannada and Kedage), when it comes to numerals.
For example, 0 is getting recognized as 8 (and ನ as ವ).
I needed help in including the unicharambigs file (the documentation on Github describes the format solely).My output.txt file has not changed,despite including the unicharambigs file.
Suppose [lang] corresponds to kan, will the following command include the unicharambigs file in the kan.traineddata file?
combine_tessdata kan.
Incase it doesn't, I'd appreciate any help regarding how to proceed with the same.
Difficult to answer not knowing which version of tesseract and kan.traineddata you're using.
You can unpack the kan.traineddata to see the version of kan.unicharabigs included in it and then recombine it after editing the file.
see https://github.com/tesseract-ocr/tesseract/blob/master/doc/combine_tessdata.1.asc for command syntax
Use -u option to unpack:
-u .traineddata PATHPREFIX Unpacks the .traineddata using the provided prefix.
Use -o option to overwrite ucharambigs:
-o .traineddata FILE…​: Overwrites the specified components of the .traineddata file with those provided on the command line.
Please note that https://github.com/tesseract-ocr/langdata/blob/master/kan/kan.unicharambigs seems to be a copy of eng.unicharambigs

Dump entire ELF binary into readable headers and sections

Is it possible to dump an entire binary structure of an ELF-ARM file into a readable format.The idea is to analyze each section and save it back to a binary.
There are several tools for inspecting the inner structures of elf files, the most common are readelf, objdump and elfsh (from the ERESI framework). It depends on what part of the binary are you interested in, but, for example, you can dump the section header table using the following command:
arm-elf-objdump -h elffile
If you want to copy a certain section from the elf file into a raw binary file, use objcopy:
arm-elf-objcopy -j .data -O binary elffile binfile

Print html file with CUPS

Is there a way to explicitly tell the CUPS server that the file you are sending is text/html thus overriding the mime.types lookup?
Yes, there is.
Use this commandline:
lp -d printername -o document-format=text/html file.html
Update (in response to comments)
I provided an exact answer to the OP's question.
However, this (alone) does not guarantee that the file will be successfully printed. To achieve that, CUPS needs a filter which can process the input of MIME type text/html.
Such a filter is not provided by CUPS itself. However, it is easy to plug your own filter into the CUPS filtering system, and some Linux distributions ship such a filter capable to consume HTML files and convert them to a printable format.
You can check what happens in such a situation on your system. The cupsfilter command is a helper utility to run available/installed CUPS filters without the need to do actual printing through the CUPS daemon:
touch 1.html
/usr/sbin/cupsfilter --list-filters 1.html
Now on a system with no HTML consuming filter ready, you'd get this response:
cupsfilter: No filter to convert from text/html to application/pdf.
On a different system (like on a Mac), you'll see this:
xhtmltopdf
You can even force input and output MIME types to see which filters CUPS would run automatically when asked to print this file an a printer supporting that particular output MIME type (-i sets the input MIME type, -m the output):
/usr/sbin/cupsfilter \
-i text/html \
-m application/postscript \
--list-filters \
1.html
xhtmltopdf
cgpdftops
Here it would first convert HTML to PDF using xhtmltopdf, then transform the resulting PDF to PostScript using cgpdftops.
If you skip the --list-filters parameter, the command would actually even go ahead and do the conversion by actively running (not just listing) the two filters and emit the result to <stdout>.
You could write your own CUPS filter based on a Shell script. The only other ingredient you need is a command line tool, such as htmldoc or wkhtmltopdf, which is able to process HTML input and produce some format that in turn could be consumed by the CUPS filtering chain further down the road.
Be aware, that some (especially JavaScript-heavy) HTML files cannot be successfully processed by simple command line tools into print-ready formats.
If you need more details about this, just ask another question...

How to display non-ASCII characters from a XML output

I get this output in a XML element:
£111.00
It should be £111.00.
How can i sort this out so that all unicode characters are displayed rather than the code. I am using linux tool wget to fetch the xml file from the Internet. Perhaps some sort of convertor?
I am viewing the file in putty , i am parsing the file and i want to clean the input before parsing.
I am using xml_grep2 to get the elements i want and then cat filename | while read .....
Ok i'm going to close this question now.
After parsing the file with xml_grep2 i was able to get a clean output however was seeing this à character in the file. I changed putty settings for character set to UTF-8 from ISO-8859 to resolve that.
You can use HTML::Entities to replace the entities with literal character codes. I don't know how good its coverage is, though. There are bound to be similar tools for other languages if you are not comfortable with Perl. http://metacpan.org/pod/HTML::Entities
sh$ echo '£111.00' | perl -CSD -MHTML::Entities -pe 'decode_entities($_)'
£111.00
This won't work if the HTML::Entities module is not installed. If you need to install it, there are numerous tutorials about the CPAN on the Internet.
Edit: Add usage example. The -CSD option might not be necessary on your system, but on OSX at least, I got garbage output without it.