How to edit a binary at specific address locations using Nim? - binary

I'm trying to understand how to modify a Windows executable using Nim to write NOPs in specific locations of the file. In Python I do this as follows:
with open('file.exe', 'r+b') as f:
f.seek(0x0014bc0f)
f.write(binascii.unhexlify('9090'))
f.seek(0x0014bc18)
f.write(binascii.unhexlify('9090'))
But in Nim I can't find the necessary methods to accomplish the seeking to a specific address and how to input these op-codes properly. I'm pretty new to Nim and the different file operation possibilities are a little confusing.

the docs for io are quite good, but to start you off:
opening a file for reading and writing (equivalent of 'r+') is:
let f = open("file.exe",fmReadWriteExisting)
the equivalent of with here, to make sure to close the file on any error:
defer: f.close
seeking to an offset (by default bytes relative to the start):
f.setFilePos(0x014bc0f)
writing two bytes at that position (nop nop)
f.write("\x90\x90")
check out the documentation for each proc for more options

Related

Config File Checksum guessing (CRC)

I'm currently "hacking" an old 3d Printer, built in 1996. There is Software running on an old Windows PC. I need to modify some parameters which are not accessible from the front end, so I wanted to modify the config files. But if I modify something, it could not be read anymore. I noticed, that there is a checksum at the end of the file, and I'm not really an checksum expert. I assume that, while loading the file, this checksum is calculated again and compared to the one at the end.
I'm having trouble finding out which checksum algorithm is used.
What I already found out: I think it's not just an addition of the bits in the file. When I'm switching two characters, an checksum, that is generated with addition, would not change. But the software won't take that file.
I'm guessing its some kind of CRC16, because a checksum looks like that:
0x4f20
As I have calculated that number with several usual CRC16 parameters and could not find a match with the "4f20", I assume that it must be an custom CRC16..
Here is a complete sample file:
PACKET noname
style 502
last_modified 1511855084 # Tue Nov 28 08:44:44 2017
STRUCTURE MACHINE_OVRL
PARAM distance_units
Value = "millimeters"
ENDPARAM
PARAM language
Value = "English"
ENDPARAM
ENDSTRUCTURE
ENDPACKET
checksum 0x4f20
I think either the checksum itself or the complete line "checksum 0x4f20" is not being considered while calculated, because thats not possible (?)
Any help is appreciated.
Edit: I got some more files with checksums of course, but these are a lot longer than this file. If needed, I could provide them too..
RevEng was written for this purpose. Given several examples of the input and the associated CRCs, RevEng will derive the CRC parameters. If it is a CRC.

Iterating over a string in Vimscript or Parse a JSON file

So I'm creating a vim script that needs to load and parse a JSON file into a local object graph. I searched and I couldn't find any native way to process a JSON file, and I don't want to add any dependencies to the script. So I wrote my own function to parse the JSON string (gotten from the file), but it's really slow. At the moment, I iterate through each character in the file like so:
let len = strlen(jsonString) - 1
let i = 0
while i < len
let c = strpart(jsonString, i, 1)
let i += 1
" A lot of code to process file....
" Note: I've tried short cutting the process by searching for enclosing double-quotes when I come across the initial double quotes (also taking into account escaping '\' character. It doesn't help
endwhile
I've also tried this method:
for c in split(jsonString, '\zs')
" Do a lot of parsing ....
endfor
For reference, a file with ~29,000 characters takes about 4 seconds to process, which is unacceptable.
Is there a better way to iterate over a string in vim script?
Or better yet, have I missed a native function to parse JSON?
Update:
I asked for no dependencies because I:
Didn't want to deal with them
Genuinely wanted some ideas for best way to do this without someone else's work.
Sometimes I just like to do things manually even though the problem has already been solved.
I'm not against plugins or dependencies at all, it's just that I'm curious. Thus the question.
I ended up creating my own function to parse the JSON file. I was creating a script that could parse the package.json file associated with node.js modules. Because of this, I could rely on a fairly consistent format and quit the processing whenever I'd retrieved the information I needed. This usually cut out large chunks of the file since most developers put the largest chunk of the file, their "readme" section, at the end. Because the package.json file is strictly defined, I left the process somewhat fragile. It assumed a root dictionary { } and actively looks for certain entries. You can find the script here: https://github.com/ahayman/vim-nodejs-complete/blob/master/after/ftplugin/javascript.vim#L33.
Of course, this doesn't answer my own question. It's only the solution to my unique problem. I'll wait a few days for new answers and pick the best one before the bounty ends (already set an alarm on my phone).
The simplest solution with the least dependencies is just using the json_decode vim function.
let dict = json_decode(jsonString)
Even though Vim's origin dates back a lot it happens that its internal string() eval() representation is that close to JSON that its likely to work unless you need special characters.
You can lookup the implementation here which even supports true/false/null if you want:
https://github.com/MarcWeber/vim-addon-json-encoding
Better use that library (vim-addon-manager allows to install dependencies easily).
Now it depends on your data whether this is good enough.
Now Benjamin Klein posted your question to vim_use which is why I'm replying.
Best and fast replies happen if you subscribe to the Vim mailinglist.
Goto vim.sf.net and follow the community link.
You cannot expect the Vim community to scrape stackoverflow.
I've added the keyword "json" and "parsing" to that little code that it can be found easier.
If this solution does not work for you you can try the many :h if_* bindings or write an external script which extracts the information you're looking for, or turns JSON into Vim's dictionary representation which can be read by eval() escaping special characters you care about correctly.
If you seek for completely correct solution omitting dependencies is one of the worst thing you can do. The eval() variant mentioned by #MarcWeber is one of the fastest, but it has its disadvantages:
Using solution for securing eval I mentioned in comment makes it no longer the fastest. In fact after you use this it makes eval() slower by more then an order of magnitude (0.02s vs 0.53s in my test).
It does not respect surrogate pairs.
It cannot be used to verify that you have correct JSON: it accepts some strings (e.g. "\<C-o>") that are not JSON strings and it allows trailing commas.
It fails to give normal error messages. It fails badly if you use vam#VerifyIsJSON I mentioned in p.1.
It fails to load floating point values like 1e10 (vim requires numbers to look like 1.0e10, but numbers like 1e10 are allowed: note “and/or” in the first paragraph).
. All of the above (except for the first) statements also apply to vim-addon-json-encoding mentioned by #MarcWeber because it uses eval. There are some other possibilities:
Fastest and the most correct is using python: pyeval('json.loads(vim.eval("varname"))'). Not faster then eval, but fastest among other possibilities. (0.04 in my test: approximately two times slower then eval())
Note that I use pyeval() here. If you want solution for vim version that lacks this functionality it will no longer be one of the fastest.
Use my json.vim plugin. It has an advantages of slightly better error reporting compared to failed vam#VerifyIsJSON, slightly worse compared to eval() and it correctly loads floating-point numbers. It can be used for verification of strings (it does not accept "\<C-a>"), but it loads lists with trailing comma just fine. It does not support surrogate pairs. It is also very slow: in the test I used (it uses 279702 character long strings) it takes 11.59s to load. Json.vim tries to use python if possible though.
For the best error reporting you can take yaml.vim and purge YAML support out of it leaving only JSON (I once have done the same thing for pyyaml, though in python: see markedjson library used in powerline: it is pyyaml minus YAML stuff plus classes with marks). But this variant is even slower then json.vim and should only be used if the main thing you need is error reporting: 207 seconds for loading the same 279702 character long string.
Note that the only variant mentioned that satisfies both requirements “no dependencies” and “no python” is eval(). If you are not fine with its disadvantages you have to throw away one or both of these requirements. Or copy-paste code. Though if you take speed into account only two candidates are left: eval() and python: if you want to parse json fast you really must use C and only these solutions spend most time in functions written in C.
Most other interpreters (ruby/perl/TCL) do not have pyeval() equivalent so they will be slower even if their JSON implementation is written in C. Some other (lua/racket (mzscheme)) have pyeval() equivalent, but e.g. luaeval('{}') is zero meaning that you will have to add additional step explicitly and recursively converting objects into vim dictionaries and lists (e.g. luaeval('vim.dict({})')) which will impact performance. Cannot say anything about mzeval(), but I have never heard about anybody actually using racket (mzscheme) with vim.

An exported aliases symbol doesn't exist in PDB file (RegisterClipboardFormat has RegisterWindowMessage internal name)

I'm trying to set a breakpoint in user32!RegisterClipboardFormat
Evidently, this function is exported (link /dump /exports - it is right there). Before downloading the PDB file from the Microsoft symbol server, I'm able to find this function:
0:001> lm m user32
start end
76eb0000 76fcf000 USER32 (export symbols) c:\Windows\system32\USER32.dll
0:001> x user32!RegisterClipboardFormat*
76ec4eae USER32!RegisterClipboardFormatA (<no parameter info>)
76ec6ffa USER32!RegisterClipboardFormatW (<no parameter info>)
No problems. I'm able to 'bu' any of these functions. But when I download the PDB symbols from the Microsoft PDB server:
0:001>
start end module name
76d50000 76e6f000 USER32 (pdb symbols) c:\symbols\user32.pdb\561A146545614951BDB6282F2E3522F72\user32.pdb
0:000> x user32!RegisterClipboardFormat
WinDBG cannot find the symbols. However, it can find RegisterWindowMesssage:
0:000> x user32!RegisterWindowMessage*
76d64eae USER32!RegisterWindowMessageA = <no type information>
76d66ffa USER32!RegisterWindowMessageW = <no type information>
Note that the functions have the same addresses (This is on Windows 8. Not sure about previous versions). This is probably achieved by the optimizer or in the DEF file (func1=func2 in the EXPORT section). 'link /dump /exports' shows RegisterWindowMessage and RegisterClipboardFormat have the same RVA.
Problem is that I spent way too much time on this. So my questions are:
Is there is an easy way, from within WinDBG to find out missing aliased export symbols.
Say I want to break only on RegisterClipboardFormatW. If I recall correctly, there should be a JMP instruction somewhere (in the calling module import table). How do I find that symbol? Is there a way to find this entry in all calling modules?
Since RegisterWindowMessage and RegisterClipboardFormat have the same RVA, they share the same implementation. Apparently Windows does not make any distinction between the two and both clipboard format and window messages share the same domain of identifiers.
For your first question -- how to find out which implementation function corresponds to exported function. (assuming you have symbols fixed up) First figure out RVA of the export:
C:\>link /dump /exports C:\Windows\Syswow64\user32.dll |findstr RegisterClipboardFormat
2104 24F 00020AFA RegisterClipboardFormatA
2105 250 00019EBD RegisterClipboardFormatW
Then in WinDbg find starting address where DLL is loaded from. Commands lm or lml list all modules, you just need to find the module you are after:
0:001> lml
start end module name
75460000 75560000 USER32
Using RVA as offset to the starting address, get symbol that corresponds to it:
0:002> ln 75460000+00020AFA
(75480afa) USER32!RegisterWindowMessageA | (75480b4a) USER32!MsgWaitForMultipleObjects
Exact matches:
0:002> ln 75460000+00019EBD
(75479ebd) USER32!RegisterWindowMessageW | (75479eea) USER32!NtUserGetProcessWindowStation
Exact matches:
So here we actually found out that RegisterClipboardFormat actually calls into RegisterWindowMessage.
Your second question -- how to put breakpoint only on RegisterClipboardFormat, and not on RegisterWindowMessage. In general it is impossible, because they share the same implementation. For example, your app might call GetProcAddress("RegisterClipboardFormat") and you will have hard time figuring out if it called to one function or another. However if you know that the call was made through imported function, then you can do this. All imported functions are declared in import address table in your application. If you put an access breakpoint on the entry in import address table, you can break before the call is made. This might be compiler specific, but I know that Visual C++ assigns symbolic names to entries in import address table. In this case putting breakpoint is easy:
ba r4 MyModule!_imp_RegisterClipboardFormatA

Converting data stored in Fortran 90 binaries to human readable format

In your experience, in Fortran 90, what is the best way to store large arrays in output files? Previously, I had been trying to write large arrays to ASCII text files. For example, I would do something like this (thanks to the recommendation at the bottom of the page In Fortran 90, what is a good way to write an array to a text file, row-wise?):
PROGRAM testing1
IMPLICIT NONE
INTEGER :: i, j, k
INTEGER, DIMENSION(4,10) :: a
k=1
DO i=1,4
DO j=1,10
a(i,j)=k
k=k+1
END DO
END DO
OPEN(UNIT=12, FILE="output.txt", ACTION="WRITE", STATUS="REPLACE")
DO i=1,4
DO j=1,10
WRITE(12, "(i2,x)", ADVANCE="NO") a(i,j)
END DO
WRITE(12, *)
END DO
CLOSE(UNIT=12)
END PROGRAM testing1
This works, but as pointed out by the topmost reply at In Fortran 90, what is a good way to write an array to a text file, row-wise?, writing large arrays to text files is very slow and creates files that are somewhat larger in size than is necessary. The poster there recommended instead writing to an unformatted Fortran binary, using something like:
PROGRAM testing2
IMPLICIT NONE
INTEGER :: i, j, k
INTEGER, DIMENSION(4,10) :: a
k=1
DO i=1,4
DO j=1,10
a(i,j)=k
k=k+1
END DO
END DO
OPEN(UNIT=13, FILE="output.dat", ACTION="WRITE", STATUS="REPLACE", &
FORM="UNFORMATTED")
WRITE(13) a
CLOSE(UNIT=13)
END PROGRAM testing2
This seems to work, and is indeed much faster and results in smaller file sizes, as promised by the reply here. However, what do I do if I would like to be able to later work with the data stored in Fortran binary (e.g., output.dat above) and analyze its contents? For example, what if I want to open the array stored in the binary in a program such as Microsoft Excel?
When I mentioned matlab in my previous post, the reply suggested that I open the binary as a hexadecimal file and figure out and extract the records from there. But, I am nervous that I am getting into deep water since I have no prior experience in hexadecimal sleuthing. When I asked on the matlab board (here: http://www.mathworks.com/matlabcentral/answers/12639-advice-on-reading-an-unformatted-fortran-binary-file-into-matlab) about reading Fortran files into matlab, the person there suggested that using Fortran stream might be easy. But is Fortran stream (i.e., using the directive ACCESS="STREAM" in the OPEN command) likely to be similar in time and file size to the ASCII text file that I created in my first example above?
Or, do you know if there is any other software that can automatically read Fortran binaries into some sort of human readable form? (Or, do you know of any good tutorials on either hexadecimal sleuthing or Fortran stream?)
Thank you very much for your time.
Stream is a choice independent of the choice of formatted / unformatted -- one is "access", the other "format" The default for Fortran I/O is record oriented access. The typical approach of a Fortran compiler for records (at least unformatted) to write a 4-byte record length before and after each record. (The "after" is to make reading backwards easier.) Using a hex edit you could verify these extra data items that I described and skip them in MatLab. But they are not part of the language standard and are not portable and are certainly not obvious in other languages. If you select stream and unformatted you will just get the raw sequence of bytes corresponding to your data items -- no extra data items to worry about in the other language! In my experience this output tends to be fairly easy to read in other languages (not tried in MatLab). If this is a small & simple project with portability of the files to other computers not an issue, I would probably use this approach (stream & unformatted) rather than a file format specification such as HDF5 or FITS. I'd write the array as write (13) a, as in your final example. Depending on the other language, you might have to transpose the dimensions. If this is a major and long-lived project with portability a concern, then a portable and standard file interface is worth considering.
I don't know whether any of these formats can be read from Excel. More research.... You might have to write a program to read the binary file of whatever format and output a file in a format that Excel understands.
(converting comment into an answer for posterity)
Are you specifically trying to get information into Matlab? If you are, I highly recommend HDF5. This is the portable binary format you have been looking for.
For converting a Fortran binary to HDF5, you're going to have to read in the original Fortran binary and then write out the same data to an HDF5 file. If you have the Fortran source, this should be pretty easy. Allocate your arrays, make sure you read the arrays in the same order as you wrote them and then write out your new shiny HDF5 file.
The HDF5 group has tutorials with examples in C and Fortran. There is likely an example very close to what you're trying to do. When you build HDF5, make sure to manually enable Fortran support. It is disabled by default.
%In MATLAB
fid=fopen('YOUR_FILE.direct','r'); %Fortran Direct ACCESS
frewind(fid);
tbb=ones(367,45203);
for i =1:367
temp=fread(fid,[45203],'single');
tbb(i,:)=temp;
end
fclose(fid)

How to use the 7z DLL to compress and append many small chunks of data to a file

I would like to use the 7z DLL to append small amounts of data to one compressed file. At the moment my best guess would be uncompress the 7z file, append the data, and recompress it. Obviously, this is not a good solution performance wise, if the size of the 7z file becomes large (say 1 gb) and I want do save a new chunk every second. How could I do this in a better way?
I could use any compression format supported by the 7z DLL.
Have a look at the Python LZMA bindings (LZMA is the compression algorithm name of 7z), you should do what you want without ctypes stuff.
EDIT
To be confirmed, but a quick look at py7zlib.py shows only support for reading 7z files, not writing. However in the src dir there's a pylzma_compressfile.c, so maybe there's something to do.
EDIT 2
The pylzma.compressfile function seems to be there, so fine.
This is NOT MY ANSWER.
How can I use a DLL file from Python?
I think ctypes is the way to go.
The following example of ctypes is from actual code I've written (in Python 2.5). This has been, by far, the easiest way I've found for doing what you ask.
import ctypes
# Load DLL into memory.
hllDll = ctypes.WinDLL ("c:\\PComm\\ehlapi32.dll")
# Set up prototype and parameters for the desired function call.
# HLLAPI
hllApiProto = ctypes.WINFUNCTYPE (ctypes.c_int,ctypes.c_void_p,
ctypes.c_void_p, ctypes.c_void_p, ctypes.c_void_p)
hllApiParams = (1, "p1", 0), (1, "p2", 0), (1, "p3",0), (1, "p4",0),
# Actually map the call ("HLLAPI(...)") to a Python name.
hllApi = hllApiProto (("HLLAPI", hllDll), hllApiParams)
# This is how you can actually call the DLL function.
# Set up the variables and call the Python name with them.
p1 = ctypes.c_int (1)
p2 = ctypes.c_char_p (sessionVar)
p3 = ctypes.c_int (1)
p4 = ctypes.c_int (0)
hllApi (ctypes.byref (p1), p2, ctypes.byref (p3), ctypes.byref (p4))
The ctypes stuff has all the C-type data types (int, char, short, void*, ...) and can pass by value or reference. It can also return specific data types although my example doesn't do that (the HLL API returns values by modifying a variable passed by reference).