How to analyze binary file? - binary

I have a binary file. I don't know how it's formatted, I only know it comes from a delphi code.
Does it exist any way to analyze a binary file?
Does it exist any "pattern" to analyze and deserialize the binary content of a file with unknown format?

Try these:
Deserialize data: analyze how it's compiled your exe (try File Analyzer). Try to deserialize the binary data with the language discovered. Then serialize it in a xml format (language-indipendent) that every programming language can understand
Analyze the binary data: try to save various versions of the file with little variation and use a diff program to analyze the meaning of every bit with an hex editor. Use it in conjunction with binary hacking techniques (like How to crack a Binary File Format by Frans Faase)
Reverse Engineer the application: try getting code using reverse engineering tools for the programming language used for build the app (found with File Analyzer). Otherwise use disassembler analysis tool like IDA Pro Disassembler

For my hobby project I had to reverse engineer some old game files. My approaches were:
Have a good hex editor.
Look for readable words in the binary file. Note how their distribution is. If the distance between them is constant you know it is a listing.
Look for 2-3 consequent zeros. Might indicate an int32 value.
Some dwords might be pointers into the file.
Try to identify reoccurring patterns in the file.
Seeing lots of C0-CF might indicate RLE compressed data.

I've developed Hexinator (Window & Linux) and Synalyze It! (macOS) exactly for this purpose. These applications allow you to see the binary files like in other hex editors but additionally you can create a "grammar" with the specifics of a binary file format. The grammar contains all the building blocks and is used to parse the file automatically.
Thus you can keep the knowledge you gain in the analysis and apply it to multiple files simultaneously. You can also color-code the bits and pieces of file formats for a quick overview in the hex editor.
The parsing results are displayed in a tree view where you can also modify the files easily (applying endianness et cetera).

Reverse engineering a binary file when you have some idea of what it represents is a very time consuming process. If you have no idea what it is then it will be even harder.
It is possible though, but you have to have a pretty good reason for doing so.
The first step would be to open it up in a hex editor of your choice and see if you can find any English text to point you in the direction of what the file is even supposed to represent. From there, Google "Reverse Engineering binary files", there are much more knowledgeable people than me that have written guides about it.

The "strings" program from GNU binutils is very useful. It will print the strings of printable characters in a file, quite often giving a clue to what a file contains or a program does.

If the data represents serialized Delphi objects, you should start reading about the Delphi serialization process. If that's the case, I think your best bet would be to load it using Delphi and continue your analysis from the IDE. Some informations about Delphi serialization can be found here.
EDIT: if the file does contain serialized delphi objects, then you should write a small delphi program that loads it, and "convert" the data yourself to something neutral, like xml. If you manage to do this, you should check and see if delphi supports serializing to xml. Then, you could access those objects from any language.

The unix "file" command is really useful - I don't know if there is anything like it in windows. You run it like this:
file myfile.ext
And it spits out a text description based on the magic numbers and data contained therein.
Probably it is contained within cygwin.

If you have access to the application that creates the file, you can apply changes to the application, then save the file and see the effects (Keep in mind that numbers are probably stored in little endian):
First create the file repeatedly. If the files are not binary equal, the current date/time is probably stored in the area where hte differences occur.
Maybe you want to repeat that with the software running under different environments, to see if OS version etc are stored, but this is rather unusual.
Next you can try to change single variables and create several files that only differ in the value of this variable. This helps you identify where this variable is stored.
That way you can also exclude variables that are not stored in the file: If you change them, but the files created are identical, they are not stored.
In order to test the hypotheses you worked out with the steps above, edit one of the files and have the application read it.
If you don't have access to the application itself, I suggest that you forget about it and find another way to solve your problem. There is a very high probability that it will be faster...

If file does not give a meaningful answer, you may want to try TRiD by Marco Pontello to determine whether your data is stored in a known format.

Get the Delphi application and open it in IDA Pro freeware version, and find where it writes the file, and decode how it writes the file that way.
Unless it's plan text.

Do you know the program that uses it? If so you can hook that programs write to file function and get an idea of what data its writing, the size of the data and where.
More Info: http://www.codeproject.com/KB/DLL/Win32APIHooking_Trouble.aspx

Unlike traditional hex editors which only display the raw hex bytes of a file, 010 Editor can also parse a file into a hierarchical structure using a Binary Template. The results of running a Binary Template are much easier to understand and edit than using just the raw hex bytes.
http://www.sweetscape.com/010editor/

Try to open it in a hex editor and analyse.

Related

How should a "project" file be written?

With popular software packages, like Microsoft Word or Photoshop, we often have an option to save our progress as a "project" file and later can open that file to edit our works furthermore. This file often contains all the options and the progress that the user has made (i.e the essay you typed in Word).
So my question is, if I am doing a similar application that requires creating a similar "project" file, how should I go about this? My application is a scientific application, which means it required a lot of (multi-dimension) arrays. I understand there will be a lot of options to do this, but I would like to know the de facto way.
Here are some of the options I have outline out:
XML: Human readable. The size is too big and it's too much work to deal with arrays.
JSON: More popular/modern. Good with array.
Protocol Buffer: It is created by Google. Probably faster.
Database: Probably not a good use case since "project" files are most likely "temporary". Also, working with arrays is not very straight forward.
Creating your own binary format: Might be the most difficult solution for an inexperienced programmer like myself.
???
I would like to get some advice from you guys. Thank you :).
(Good question. :) Only some thoughts) I'd prefer text format for the main project file. You can make diffs and open and read and modify it easily. Large ascii or binary data can be stored as serialized data in external files or in a database like SQLite from where it can be easily accessed and processed through the application. The main project has links to the external data store. My advice for the main project file is a simple XML format that can easily be transformed to JSON format. A list of key value pairs (dict) is good for the beginning. value can be of basic datatype or be an array or dict. A complicated XML tree is not good. The key name can also help to describe and structure data. So i'd prefer key="rect.4711.pos.x" value="500" and not <rect id="4711"><pos><x>500</x>...</pos>.... Important aspect is that the project data is portable and self-contained, and the user can see the project as a single unit even if it is a directory on the file system, for this purpose supporting some kind of zipped format of project data is good.

Can I write a program in binary directly ? How can I get the computer to execute it?

I know that may seem weird and looking for troubles but I think experiencing what the ancient programmers experienced before is something interesting. So how can I execute a program written only in binary? (Suppose that I know what I am doing and not using assembly of course.)
I just want to write a series of bits like 111010111010101010101 and execute that. So how can I do that?
Use a hex editor. You'll need to find out the relevant executable format for your operating system, of course - assuming you want to use an operating system... I suppose you could always write your own bootloader and just run the code directly that way, if you want to get all hardcore.
I don't think you'll really be experiencing what programmers experienced back then though - for one thing, you won't be using punch cards, paper tape etc. For another, your context is completely different - you know what computers are like now, so it'll feel painfully primitive to you... whereas back then, it would have been bleeding edge and exciting just on those grounds.
Use a hex editor, write your bits and save it as an executable file (either just with the file extension .exe in Windows or with chmod a+x filename in Linux).
The problem is: You'd also have to write all the OS-specific stuff in binary format, and you'll have to have a table that translates from assembler code to binary stuff.
Why not, if you want to experience low-level programming, give D.E. Knuth's assembler MMIX a try?
It really depends on the platform you are using. But that's sort of irrelevant based on your proposed purpose. The earliest programmers of modern computers as you think of them did not program in binary -- they programmed in assembly.
You will learn nothing trying to program in binary for a specific Operating System and specific CPU type using a hex editor.
If you want to find out how pre-assembly programmers worked (with plain binary data), look up Punch Cards.
.
Use a hex editor to create your file, be sure to use a format that the loader of your respective OS understands and then double click it.
most assemblers (MMIX assembler for instance see www.mmix.cs.hm.edu) dont care if
you write instructions or data.
So instead of wirting
Main ADD $0,$0,3
SUB $1,$0,4
...
you can write
Main TETRA #21000003
TETRA #25010004
...
So this way you can assemble your program by hand and then have the assembler transform it in a form the loader needs. Then you execute it. Normaly you use hex notatition not binary because keeping track of so many digits is difficult. You can also use decimal, but the charts that tell you which instructions have which codes are typically in hex notation.
Good luck! I had to do things like this when I started programming computers. Everybody was glad to have an assembler or even a compiler then.
Martin
Or he is just writing some malicious code.
I've seen some funny methods that use a AVR as a keyboard emulator, open some simple text editor, write the code that's in the AVR eeprom memory, and pipe it to "debug" (in windows systems), and run it. It's a good way to escape some restrictions too ;)
I imagine that by interacting directly with hardware you could write in binary. To flip the proper binary bits, you could use a magnetized needle on your disk drive. Or butterflies.

What is the best file format to parse?

Scenario: I'm working on a rails app that will take data entry in the form of uploaded text-based files. I need to parse these files before importing the data. I can choose the file type uploaded to the app; the software (Microsoft Access) used by those uploading has several export options regarding file type.
While it may be insignificant, I was wondering if there is a specific file type that is most efficiently parsed. This question can be viewed as language-independent, I believe.
(While XML is commonly parsed, it is not a feasible file type for sake of this project.)
If it is something exported by Access, the easiest would be CSV; particularly since Ruby contains a CSV parser in the standard library. You will have to do some work determining the dialect of CSV (what it uses for delimiter, how it handles quotes); I don't know how robust the ruby parser is with those issues, but you also should have some control from Microsoft Access.
You might want to take a look at JSON. It's a lightweight format, and in contrast to XML it's really easy and clean to parse without requiring a huge library on the backend.
It can represent types like strings, numbers, assosiative arrays (objects), and lists of such
I would suggest n-SV (where n is some character) for data that does not include n. That will make lexing the files a matter of a split.
If you have more flexible data, I would suggest JSON.
If you've HAVE to roll your own parser, I would suggest CSV or some form of a delimiter separated format.
If you are able to use other libraries, there are plenty of options. JSON looks quite fascinating.

Decipher binary format of file

I have a binary file to which I'm trying to write however I dont have the file format specification nor have found it using google, I've been looking at the file using a hex editor but so far has only give me a headache, is there a better way to decipher the format of the file so that I can append data to it?
File carving tools such as scalpel won't really help here. They're made for extracting files with known header and/or footer signatures from a memory dump or some larger, composite file.
For your scenario, I would recommend a hex editor with templating capability, like the 010 Editor. This will allow you to name and annotate "fields" in the binary as you learn more about what each part of the file does. Unfortunately, the process of finding out what each field does is mostly manual. As a methodology, just start playing with it. Change some values in your current binary and see what happens. Expect to spend significant time on it, but also enjoy the process!
you may want to search it with a open source forensic application like foremost or scalpel. They will do most of the grunt work for you, you just likely wont learn anything.

How can I analyze a closed format (e.g. doc or vce)?

I want to study the .vce format. It's a binary format and it seems more complicated than a simple object serialization. Does it exist any tool or technique to analyze a binary format?
You might need to "Reverse-Code-Engineer" a programm using this file format (http://www.openrce.org/). Tools used for this kind of analysis are: brain, disassembler (IDA Pro for example) and Debugger (OllyDBG for example). But beware - the way for successfull reverse engineering a file format is veeeeeerrry hard.
And reversing an application might be illegal depending on where you live!
You'll have to get a library that can read the format (or create one yourself).
Here is some of the microsoft office binary format specifications
I believe it would only be possible through some nasty reversed-engineering. It would be very useful to have access to application that uses mentioned format, so that you can generate few simple files and compare them in hex editor. You cannot get far with this method, but you might be able to figure out the header.
It would also be useful to study some binary format mechanisms, such as encryption and compression. If you're talking about Visual CertExam file format, than it is likely that useful data will be strongly encrypted.
My 2 cents:
Start by reversing the application reading the files themselves. Particularly android applications are helpful, as the resulting java source is easier to read (you might want to try A+ vce reader for android for example). This program indicates that vce uses/embeds sqlite in the file (in line with what is hinted here: Reverse Engineer a File Format).
Where to go from here? You might want to explore sqlite file carving tools to see if there might be a way to programatically identify the patterns in the file. Good luck!