Wireshark package beginning identification - binary

I have a ".pcapng" binary file, created by Wireshark.
How to detect the beginning of every new package in it?
Is there any specific bytes sequence?
Alternatively, how to detect the end of a package?

(I've seen people whose native language isn't English speak of "packages" rather than "packets" - both words come from the same word "pack", and the same word may be used for both concepts in other languages - so I'm assuming you're referring to network packets; "packages" is generally not used in that sense in English.)
The pcap-NG file format is described in the PCAP Next Generation Dump File Format document. A pcap-NG file is a sequence of blocks; each block has a length field at the beginning (and at the end, to simplify scanning backwards through a file). Not all blocks contain packets; the blocks that do are the Packet Block, Extended Packet Block, and Simple Packet Block.
Note that libpcap 1.1 and later can read pcap-NG files, so any program that uses libpcap to read capture files can, if dynamically liked with libpcap and running on a system where the libpcap shared library is 1.1 or later, or statically linked with libpcap 1.1 or later, can read some pcap-NG files using the same APIs that are used to read pcap files, without any change to the program. (pcap-NG files containing multiple interfaces where not all of them have the same link-layer header type or snapshot length cannot be read, as the current libpcap APIs don't support that.) There is no version of WinPcap based on libpcap 1.1 or later, so WinPcap cannot currently be used to read pcap-NG files.
Another library that can read pcap-NG files is the NTAR library. It, however, can only read pcap-NG files, not pcap files.

Related

Where is the difference between "binaries" and "executables" in the context of an executable program?

I often see the terms "binary" and "executable" seemingly used interchangeably for the same thing.
Ain´t it two terms to describe the exact same thing; The executable output program after a compilation process, which I can run on the terminal?
What do strengthen my assumption, that these two things shall be the same is also, that it is a common practice to provide a bin folder ("bin" as abbreviation for "binaries") inside the installation folders of an application, to store the executable files in, which users are be able to run.
I have read the question and answers of What's the difference between binary and executable files mentioned in ndisasm's manual? but the question and their answer are more focused on the respective environments of Clang and ndisasm.
I´ve also read the question and the answers of https://softwareengineering.stackexchange.com/questions/121224/what-are-binaries at the Software Engineering forum, but also here no distinction between an executable and a binary; only what the term of "binary" in general can refer to:
But, in Computing, Binary refers to :
Binary file, composed of something other than human-readable text
Executable, a type of binary file that contains machine code for the computer to execute
Binary code, the digital representation of text and data
[Source: https://softwareengineering.stackexchange.com/a/121235/349225]
where, in the context of the output program of a compilation process, a binary was referred to as the same as an executable, as well as:
The word binaries is used as a set of files which are produced after compiling essentially the object code that runs on machines. (and virtual machines/runtimes in case of Java/.NET)
[Source: https://softwareengineering.stackexchange.com/a/121234/349225
]
where it was referred to the same.
What is the difference between "binaries" and "executables" in the context of an executable program?
Where is the distinction?
An executable file is one which can be executed; you would run it on the commandline by writing the name of the file itself as the command. On Unix systems, the file's "executable" flag must also be set. On Windows, the file's extension must be one of a fixed set of executable file extensions, including .exe.
A binary file is simply one in a binary (i.e. non-text) format. The binary format means that the file's contents should not be transformed for platform-specific reasons (e.g. replacing newlines from \n to \r\n).
Binary files are not necessarily executable, for example a library compiled to .dll or .so form is a binary but not an executable. A Java program compiled to .class or .jar form is not an executable file, but might be run using the command java -jar program.jar rather than the command ./program.jar.
Executable files are not necessarily binary, for example a Python script in text form can be made executable on Unix systems by writing a shebang line #!/usr/bin/python3 and setting the file's executable flag.
It helps to understand the context of the term "binary" here. It originates from compilers, which take the (text-based) source code of a program and turn that source code into an excutable form which is binary, not text-based. Thus in the context of compilers, "text" and "source code" are equivalent, as are "binary" and "executable". Interpreters on the other hand do not make the distinction between source code and executable code.
Things definitely got more complex over time with intermediate representations, such as used by the Java JVM, .Net's CLI or Python bytecode.
It depends on the definition. All files contain binary code and a "working" definition is the following:
Binary or text files
Files's binary code encodes text: text file
Files's binary code does not encode text: binary file
Executable or non-executable files
File can be executed: executable file
File can not be executed: non-executable file
Therefore the binary and executable are orthogonal properties i.e. any file can either have them or not in an independedent way. Some examples:
Binary executable: The .exe files in Windows, .app in MacOS (not a single file but a bundle) etc.
Text executable: Python scripts, bash scripts (require the corresponding interprenters) etc
Binary non-executable: PDF files, audio files etc
Text non-executable: C++ source code, a markdown file etc

Parsing Protocol-Buffers without .proto file

I am reverse-engineering an Android app as part of a security project. My first step is to discover the protocol exchanged between the app and server. I have found that the protocol being used is protocol buffers. Given the nature of protobuf, the original .proto file is needed to be able to unserialize the protobuf-encoded message. Since I don't have that, I used protod to disassemble the Android app and recover out any .proto files used.
I have the Android app in a form where it is a bunch of .smali and .so files. Running protod against the .so files yields only one .proto file -- google/protobuf/descriptor.proto.
I was under the impression that users of protocol buffers write their own .proto files, which might reference google/protobuf/descriptor.proto, but according to protod google/protobuf/descriptor.proto is the only protofile used by the app. Could this actually be possible and google/protobuf/descriptor.proto is enough for me to unserialize the messages between the app and server?
When you write a .proto file you can set an option optimize_for to LITE_RUNTIME (see here) and this will omit the descriptors from the generated code to reduce the size of the binary. I believe this is a common practice for mobile development since code size is a scarce resource in that environment. This may explain why you found only a single .proto file. It is unlikely that the app is actually transferring any data using descriptor.proto since that is mostly an implementation detail of the protocol buffers library.
If you cannot find any other descriptors, your best bet might be to try to interpret the protocol buffers without them. You can read about the protocol buffers wire format here. An easy way to get started would be to create a proto2 message type containing no fields and attempt to parse the data as that type. You can then use the reflection API to examine what are known as the "unknown fields" in the message and try to figure out what they represent.

loading Windows executable - unexpected data appended to sections after loading in memory

I'm trying to write a program analysing Windows executables. I was assuming that sections in executable file are directly mapped to memory. I have noticed strange behaviour in several programs. One example is crackme12.exe . When I check with debugger .rdata section loaded into memory I can see that for some reason 96 bytes have been added at the beginning of a section loaded into memory that was not there in the executable file. I have spent 2 days trying to read Windows executable documentation, but I can't find explanation why is it happening.
One explanation might be that the program itself has put a stream in the memory section, this is not unusual. You will not find this kind of explanation in the Portable Executable Documentation. Some (malware) executables also replace or add new sections. Other (obfuscated) executables will expand existing empty file sections to non-empty memory sections.

Reverse engineering a custom data file

At my place of work we have a legacy document management system that for various reasons is now unsupported by the developers. I have been asked to look into extracting the documents contained in this system to eventually be imported into a new 3rd party system.
From tracing and process monitoring I have determined that the document images (mainly tiff files) are stored in a number of 1.5GB files. These files seem to be read from a specific offset and then written to a tmp file that is then served via a web app to the client, and then deleted.
I guess I am looking for suggestions as to how I can inspect these large files that contain the tiff images, and eventually extract and write them to individual files.
Are the TIFFs compressed in some way? If not, then your job may be pretty easy: stitch the TIFFs together from the 1.5G files.
Can you see the output of a particular 1.5G file (or series of them)? If so, then you should be able to piece together what the bytes should look like for that TIFF if it were uncompressed.
If the bytes don't appear to be there, then try some standard compressions (zip, tar, etc.) to see if you get a match.
I'd open a file, seek to the required offset, and then stream into a tiff object (ideally one that supports streaming from memory or file). Then you've got it. Poke around at some of the other bits, as there's likely metadata about the document that may be useful to the next system.

What is the difference between plain binary format (.bin) and Windows Executable (.exe)?

What is the difference between plain binary format (.bin) and Windows Executable (.exe)?
I'm not sure what a "bin" file is in this case. Could be a firmware, could be an object file, could be anything really (it depends on context).
When talking about executables (exe files in the case of windows) these are usually self contained packages with everything required to run them packed in. These file formats usually contain all the executable data, string and other resources, linking data and exports, offsets, and other data stuffed in them. They have everything required for the OS to setup and enviroment to run them, like the dependent libraries that need to be loaded, the architecture it needs to run, etc.
There are lots of different ones in common use:
PE your standard windows executable and dll format (http://en.wikipedia.org/wiki/Portable_Executable)
ELF used by Linux and other UNIX clones (http://en.wikipedia.org/wiki/Executable_and_Linkable_Format)
Mach-O used by your Mac executables (http://en.wikipedia.org/wiki/Mach-O)
a.out sort of legacy executable package (http://en.wikipedia.org/wiki/A.out)
Lots of others (COFF, COM, etc).
If the operating system supports dynamically linkable libraries (dlls on windows, .so files on linux, dylibs on mac) then they usually share this same packaging format.
There's no such thing as plain binary format. There's no known standard for what is in ".bin" files. Expect any data.
EXE is a file with a well-defined structure for storing code. It's called "Portable Executable" format, and has a PE header starting with MZ.
http://en.wikipedia.org/wiki/Portable_Executable
BIN:
The BIN file type is primarily associated with 'Binary File'. Binary files are used for a wide variety of content and can be associated with a great many different programs. In general, a .BIN file will look like garbage when viewed in a file editor
for more info Click here
EXE:
The EXE file type is primarily associated with 'Executable File' by Microsoft Corporation. An executable file is basically another name for a program. Virtually all programs that run under Windows or DOS are in the .EXE format
for more info click here