How to find Code generating known data? - reverse-engineering

From debugging the program I only know that before clicking a button a set of known data isn't in memory (confirmed by memory search) and after clicking it the data is in memory (all the time a different location).
How can I find the code that generates this data?
One of the major problems (which might be important to know) is that it is a .net-Program (which I can't analyze with Reflector because it is obfuscated). So I'm analyzing the assembly generated by .NET (in Olly / Immunity / IDA).

If it is .Net you could debug the IL code. It is not easy though, but it should be possible to find the il intruction that writes the sequence into memory.
Try debugging tools for windows with the so called SOS extension.
You could also try if it would be possible to generate say C# code from the obfuscated assemblies for debugging. But this will most certainly not better readable code than IL.

Add Cheat Engine to your toolkit.
If you can get the address it will write to you could right click it and choose "Find out what writes to this address".
P.S. For a reverse effect you can select an instruction in the memory view, right click and choose "Find out what addresses this instruction access".

Related

Is it possible to view strings in Memory using IDA just like I can in OllyDbg?

I have wrote a simple registration program that requires a Name and License Key in order to get to the success message. The coded solution simply expects both fields to be case sensitive, the Name field must be Admin and the License Key must be TopSecret, both fields must match in order to be a success.
This of course is nothing complex and would never be used in a real world Application but as I am pretty much a beginner in assembly and reverse-engineering this should hopefully serve me well as I gain more understanding of the inner workings of how assembly works.
The ultimate goal for me is to disassemble my own registration schemes and identify how easy it could potentially be to "outsiders" who may try and break it, and as I get better I hope to make my Applications more stronger against such attacks.
In OllyDbg, when setting a breakpoint and stepping over using F8 I can view strings from the memory stack as shown below in the bottom right of the screen:
As you can see highlighted in the green, I entered SOME NAME in the Name field and 123456789 in the License Key field and OllyDbg has managed to show that in what I assume is the Memory Stack.
Is it possible to do this in IDA, and if so how? I have tried opening as many subviews and debugging subviews as possible and yet I cannot see a way of stepping through and watching out for strings in memory. Is there a memory stack window in IDA just like in OllyDbg?
IDA is much more useful when you are doing static analysis. When it comes to dynamic analysis and dynamic debugging, probably IDA is not my favorite tool.
To see username/password in IDA, just press SHIFT + F12 or go to View -> Open Subviews -> Strings.
There you should see the strings.
If you want to do this in dynamic analysis, put the breakpoint in exact same location you did in Ollydbg, then look at stack view.

Retrieve binary from STM32W108 using JTAG

I want to retrieve the binary file that has been loaded on a STM32W108 using JTAG. Has anyone done this before? If yes, can you post the instruction or and link to the instructions
Much appreciate.
If you specify what IDE you're using, then I might be able to give you more accurate instructions.
Here is the general method:
Open the linker-command file of your project and check the address space of the executable. You can also look for it in the map file, but you'll need to build the project first. Please note that the code-section and the data-section possibly reside in two separate memory regions.
Load the executable to RAM, or burn it to EPROM (whichever way you usually do it).
Search for the Memory-Save option, typically in either the View menu or the Debug menu.
Enter the memory address and size you found earlier, and click OK.

What's debug section in IDA Pro?

I try to analyze a dll file with my poor assembly skills, so forgive me if I couldn't achieve something very trivial. My problem is that, while debugging the application, I find the code I'm looking for only in debug session, after I stop the debugger, the address is gone. The dll doesn't look to be obfuscated, as many of the code is readable. Take a look at the screenshot. The code I'm looking for is located at address 07D1EBBF in debug376 section. BTW, where did I get this debug376 section?
So my question is, How can I find this function while not debugging?
Thanks
UPDATE
Ok, as I said, as soon as I stop the debugger, the code is vanished. I can't even find it via sequence of bytes (but I can in debug mode). When I start the debugger, the code is not disassembled imediately, I should add a hardware breakpoint at that place and only when the breakpoint will be hit, IDA will show disassembled code. take a look at this screenshot
You see the line of code I'm interested in, which is not visible if the program is not running in debug mode. I'm not sure, but I think it's something like unpacking the code at runtime, which is not visible at design time.
Anyway, any help would be appreciated. I want to know why that code is hidden, until breakpoint hit (it's shown as "db 8Bh" etc) and how to find that address without debugging if possible. BTW, could this be a code from a different module (dll)?
Thanks
UPDATE 2
I found out that debug376 is a segment created at runtime. So simple question: how can I find out where this segment came from :)
So you see the code in the Debugger Window once your program is running and as you seem not to find the verry same opcodes in the raw Hex-Dump once it's not running any more?
What might help you is taking a Memory Snapshot. Pause the program's execution near the instructions you're interested in to make sure they are there, then choose "Take memory snapshot" from the "Debugger" Menu. IDA will then ask you wether to copy only the Data found at the segments that are defined as "loder segments" (those the PE loader creates from the predefined table) or "all segments" that seem to currently belong to the debugged program (including such that might have been created by an unpacking routine, decryptor, whatever). Go for "All segments" and you should be fine seeing memory contents including your debug segments (a segment
created or recognized while debugging) in IDA when not debugging the application.
You can view the list of segements at any time by pressing Shift+F7 or by clicking "Segments" from View > Open subviews.
Keep in mind that the programm your trying to analyze might choose to create the segment some other place the next time it is loaded to make it harder to understand for you what's going on.
UPDATE to match your second Question
When a program is unpacking data from somewhere, it will have to copy stuff somewhere. Windows is a virtual machine that nowadays get's real nasty at you when trying to execute or write code at locations that you're not allowed to. So any program, as long as we're under windows will somehow
Register a Bunch of new memory or overwrite memory it already owns. This is usually done by calling something like malloc or so [Your code looks as if it could have been a verry pointer-intensive language... VB perhaps or something object oriented] it mostly boils down to a call to VirtualAlloc or VirtualAllocEx from Windows's kernel32.dll, see http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887(v=vs.85).aspx for more detail on it's calling convention.
Perhaps set up Windows Exception handling on that and mark the memory range als executable if it wasn't already when calling VirtualAlloc. This would be done by calling VirtualProtect, again from kernel32.dll. See http://msdn.microsoft.com/en-us/library/windows/desktop/aa366898(v=vs.85).aspx and http://msdn.microsoft.com/en-us/library/windows/desktop/aa366786(v=vs.85).aspx for more info on that.
So now, you should take a step trough the programm, starting at its default Entrypoint (OEP) and look for calls tho one of those functions, possibly with the memory protection set to PAGE_EXECUTE or a descendant. After that will possibly come some sort of loop decrypting the memory contents, copying them to their new location. You might want to just step over it, depending on what your interest in the program is by justr placing the cursor after the loop (thick blue line in IDA usually) and clicking "Run to Cursor" from the menu that appears upon right clicking the assembler code.
If that fails, just try placing a Hardware Breakpoint on kernel32.dll's VirtualAlloc and see if you get anything interestin when stepping into the return statement so you end up wherever the execution chain will take you after the Alloc or Protect call.
You need to find the Relative Virtual Address of that code, this will allow you to find it again regardless of the load address (pretty handy with almost all systems using ASLR these days). the RVA is generally calculated as virtual address - base load address = RVA, however, you might also need to account for the section base as well.
The alternative is to use IDA's rebasing tool to rebase the dll to the same address everytime.

how to create applications with Clozure Common Lisp (on Microsoft Windows)

I am a new one to Common Lisp (using Clozure Common Lisp under Microsoft Windows), who is familiar with c and python before. So maybe the questions are stupid here, but be patient to give me some help.
1) What's is the usual way to run a common lisp script?
Now, I wrote a bat file under windows to call ccl exe(wx86cl.exe) and evaluate (progn (load "my_script_full_path") (ccl:quit)) every time when I want to "run" my script. Is this a standard way to "run" a script for common lisp?
Any other suggestion about this?
2) What's the difference between (require 'cxml) and (asdf:operate 'asdf:load-op :cxml)?
They are seems to be the same for my script, which one should I use?
3) ignore it, not a clear question
4) When I want to load some library (such as require 'cxml), it always takes time(3s or even 5s) to load cxml every time when I "run" my script, there is also much log to standard output I show below, it seems like checking something internal. Does it means I have to spent 3-5s to load cxml every time when I want to run a simple test? It seems like a little inefficient and the output is noisy. Any suggestion?
My Script
(require 'cxml) (some-code-using-cxml)
And the output
; Loading system definition from D:/_play_/lispbox-0.7/quicklisp/dists/quicklisp/software/cxml-20101107-git/cxml.asd into #<Package "ASDF0">
;;; Checking for wide character support... yes, using code points.
; Registering #<SYSTEM "cxml-xml">
......
some my script output
---EDIT TO ADD MORE----
5) I must say that I almost forget the way of dumping image to accelerate the loading speed of lisp library. So, what is the normal process for us to develop a (maybe very simple) lisp script?
Base on the answer of what I got now, I guess maybe
a) edit your script
b) test it via a REPL environment, SLIME is a really good choice, and there should be many loop between a <==> b
c) dump the image to distribute it?( I am no sure about this)
6) Furthermore, what is the common way/form for us to release/distribute the final program?
For a lisp library, we just release our source code, and let someone else can "load/require" them.
For a lisp program, we dump a image to distribute it when we confirm that all functions go well.
Am I right?
What form do we use in a real product? Do we always dump all the thing into a image at final to speed up the loading speed?
1) Yes, the normal way to run a whole programme is to use a launcher script. However, windows has much, much better scripting support these days than just the bat interpreter. Windows Scripting Host and PowerShell ship as standard.
1a) During development, it is usual to simply type things in a the REPL (Read-Eval-Print-Loop, i.e. the lisp command line), or to use something like SLIME (for emacs or xemacs) as a development environment. If you don't know what they are, look them up. You may wish to use Cygwin to install xemacs, which will give you access to a range of linux-ish tools.
2) Require is, IIRC, a part of the standard. ASDF is technically not, it is a library that operates to make libraries work more conveniently. ASDF has a bunch of features that you will eventually want if you really get into writing large Lisp programmes.
3) Question unclear, pass.
4) See 1a) - do your tests and modifications in a running instance, thus avoiding the need to load the library more than once (just as you would in Python - you found the python repl, right?). In addition, when your programme is complete, you can probably dump an image which has all of your libraries pre-loaded.
Edit: additional answers:
5) Yes
6) Once you have dumped the image, you will still need to distribute the lisp binary to load the memory image. To make this transparent to the user, you will also have to have a loader script (or binary) to run the lisp binary with the image.
You don't have to start the lisp from scratch and load everything over again each time you want to run a simple test. For more efficient development, interactively evaluate code in the listener (REPL) of a running lisp environment.
For distribution, I use Zachary Beane's Buildapp tool. Very easy to install and use.
Regarding distribution -
I wrote a routine (it's at home and unavailable at the moment) that will write out the current image as a standard executable and quit. It works for both CLISP and SBCL.
I can rummage it up if you like.

What is instrumentation?

I've heard this term used a lot in the same context as logging, but I can't seem to find a clear definition of what it actually is.
Is it simply a more general class of logging/monitoring tools and activities?
Please provide sample code/scenarios when/how instrumentation should be used.
I write tools that perform instrumentation. So here is what I think it is.
DLL rewriting. This is what tools like Purify and Quantify do. A previous reply to this question said that they instrument post-compile/link. That is not correct. Purify and Quantify instrument the DLL the first time it is executed after a compile/link cycle, then cache the result so that it can be used more quickly next time around. For large applications, profiling the DLLs can be very time consuming. It is also problematic - at a company I worked at between 1998-2000 we had a large 2 million line app that would take 4 hours to instrument, and 2 of the DLLs would randomly crash during instrumentation and if either failed you would have do delete both of them, then start over.
In place instrumentation. This is similar to DLL rewriting, except that the DLL is not modified and the image on the disk remains untouched. The DLL functions are hooked appropriately to the task required when the DLL is first loaded (either during startup or after a call to LoadLibrary(Ex). You can see techniques similar to this in the Microsoft Detours library.
On-the-fly instrumentation. Similar to in-place but only actually instruments a method the first time the method is executed. This is more complex than in-place and delays the instrumentation penalty until the first time the method is encountered. Depending on what you are doing, that could be a good thing or a bad thing.
Intermediate language instrumentation. This is what is often done with Java and .Net languages (C~, VB.Net, F#, etc). The language is compiled to an intermediate language which is then executed by a virtual machine. The virtual machine provides an interface (JVMTI for Java, ICorProfiler(2) for .Net) which allows you to monitor what the virtual machine is doing. Some of these options allow you to modify the intermediate language just before it gets compiled to executable instructions.
Intermediate language instrumentation via reflection. Java and .Net both provide reflection APIs that allow the discovery of metadata about methods. Using this data you can create new methods on the fly and instrument existing methods just as with the previously mentioned Intermediate language instrumentation.
Compile time instrumentation. This technique is used at compile time to insert appropriate instructions into the application during compilation. Not often used, a profiling feature of Visual Studio provides this feature. Requires a full rebuild and link.
Source code instrumentation. This technique is used to modify source code to insert appropriate code (usually conditionally compiled so you can turn it off).
Link time instrumentation. This technique is only really useful for replacing the default memory allocators with tracing allocators. An early example of this was the Sentinel memory leak detector on Solaris/HP in the early 1990s.
The various in-place and on-the-fly instrumentation methods are fraught with danger as it is very hard to stop all threads safely and modify the code without running the risk of requiring an API call that may want to access a lock which is held by a thread you've just paused - you don't want to do that, you'll get a deadlock. You also have to check if any of the other threads are executing that method, because if they are you can't modify it.
The virtual machine based instrumentation methods are much easier to use as the virtual machine guarantees that you can safely modify the code at that point.
(EDIT - this item added later) IAT hooking instrumentation. This involved modifying the import addess table for functions linked against in other DLLs/Shared Libraries. This type of instrumentation is probably the simplest method to get working, you do not need to know how to disassemble and modify existing binaries, or do the same with virtual machine opcodes. You just patch the import table with your own function address and call the real function from your hook. Used in many commercial and open source tools.
I think I've covered them all, hope that helps.
instrumentation is usually used in dynamic code analysis.
it differs from logging as instrumentation is usually done automatically by software, while logging needs human intelligence to insert the logging code.
It's a general term for doing something to your code necessary for some further analysis.
Especially for languages like C or C++, there are tools like Purify or Quantify that profile memory usage, performance statistics, and the like. To make those profiling programs work correctly, an "instrumenting" step is necessary to insert the counters, array-boundary checks, etc that is used by the profiling programs. Note that in the Purify/Quantify scenario, the instrumentation is done automatically as a post-compilation step (actually, it's an added step to the linking process) and you don't touch your source code.
Some of that is less necessary with dynamic or VM code (i.e. profiling tools like OptimizeIt are available for Java that does a lot of what Quantify does, but no special linking is required) but that doesn't negate the concept.
A excerpt from wikipedia article
In context of computer programming,instrumentation refers to an
ability to monitor or measure the level of a product's performance, to
diagnose errors and to write trace information. Programmers implement
instrumentation in the form of code instructions that monitor specific
components in a system (for example, instructions may output logging
information to appear on screen). When an application contains
instrumentation code, it can be managed using a management tool.
Instrumentation is necessary to review the performance of the
application. Instrumentation approaches can be of two types, source
instrumentation and binary instrumentation.
Whatever Wikipedia says, there is no standard / widely agreed definition for code instrumentation in IT industry.
Please consider, instrumentation is a noun derived from instrument which has very broad meaning.
"Code" is also everything in IT, I mean - data, services, everything.
Hence, code instrumentation is a set of applications that is so wide ... not worth giving it a separate name ;-).
That's probably why this Wikipedia article is only a stub.