how to make Ghidra use a function's complete/original stackframe for decompiled code - reverse-engineering

I have a case where some function allocates/uses a 404 bytes temporary structure on the stack for its internal calculations (the function is self-contained and shuffles data around within that data structure). Conceptually the respective structure seems to consist of some 32-bit counters followed by an int[15] and a byte[80] array, and then an area that might or might not actually be used. Some of the generated data in the tables seems to represent offsets that are again used by the function to navigate within the temporary structure.
Unfortunately Ghidra's decompiler makes a total mess while trying to make sense of the function: In particular it creates separate "local_.." int-vars (and then uses a pointer to that var) for what should correctly be a pointer into the function's original data-structure (e.g. pointing into one of the arrays).
undefined4 local_17f;
...
dest= &local_17f;
for (i = 0xf; i != 0; i = i + -1) {
*dest = 0;
dest = dest + 1;
}
Ghidra does not seem to understand that an array based data access is actually being used at that point. Ghirda's decompiler then also generates a local auStack316[316] variable which unfortunately seems to cover only a part of the respective local data structure used by the original ASM code (at least Ghidra actually did notice that a temporary memory buffer is used). As a result the decompiled code basically uses two overlapping (and broken) shadow data structures that should correctly just be the same block of memory.
Is there some way to make Ghidra's decompiler use the complete 404 bytes block allocated by the function as an auStack404 thus bypassing Ghidra's flawed interpretation logic and actually preserve the original functionality of the ASM code?

I think I found something.. In the "Listing" view the used local-variable layout is shown as a comment under the function's header. It seems that by right clicking on a respective local-var line in that comment, "set data type" can be applied to a respective local variable. Ah, and then there is what I've been looking for under "Function/"Edit stack frame" :-)

Related

Redacted comments in MS's source code for .NET [duplicate]

The Reference Source page for stringbuilder.cs has this comment in the ToString method:
if (chunk.m_ChunkLength > 0)
{
// Copy these into local variables so that they
// are stable even in the presence of ----s (hackers might do this)
char[] sourceArray = chunk.m_ChunkChars;
int chunkOffset = chunk.m_ChunkOffset;
int chunkLength = chunk.m_ChunkLength;
What does this mean? Is ----s something a malicious user might insert into a string to be formatted?
The source code for the published Reference Source is pushed through a filter that removes objectionable content from the source. Verboten words are one, Microsoft programmers use profanity in their comments. So are the names of devs, Microsoft wants to hide their identity. Such a word or name is substituted by dashes.
In this case you can tell what used to be there from the CoreCLR, the open-sourced version of the .NET Framework. It is a verboten word:
// Copy these into local variables so that they are stable even in the presence of race conditions
Which was hand-edited from the original that you looked at before being submitted to Github, Microsoft also doesn't want to accuse their customers of being hackers, it originally said races, thus turning into ----s :)
In the CoreCLR repository you have a fuller quote:
Copy these into local variables so that they are stable even in the presence of race conditions
Github
Basically: it's a threading consideration.
In addition to the great answer by #Jeroen, this is more than just a threading consideration. It's to prevent someone from intentionally creating a race condition and causing a buffer overflow in that manner. Later in the code, the length of that local variable is checked. If the code were to check the length of the accessible variable instead, it could have changed on a different thread between the time length was checked and wstrcpy was called:
// Check that we will not overrun our boundaries.
if ((uint)(chunkLength + chunkOffset) <= ret.Length && (uint)chunkLength <= (uint)sourceArray.Length)
{
///
/// imagine that another thread has changed the chunk.m_ChunkChars array here!
/// we're now in big trouble, our attempt to prevent a buffer overflow has been thawrted!
/// oh wait, we're ok, because we're using a local variable that the other thread can't access anyway.
fixed (char* sourcePtr = sourceArray)
string.wstrcpy(destinationPtr + chunkOffset, sourcePtr, chunkLength);
}
else
{
throw new ArgumentOutOfRangeException("chunkLength", Environment.GetResourceString("ArgumentOutOfRange_Index"));
}
}
chunk = chunk.m_ChunkPrevious;
} while (chunk != null);
Really interesting question though.
Don't think that this is the case - the code in question copies to local variables to prevent bad things happening if the string builder instance is mutated on another thread.
I think the ---- may relate to a four letter swear word...

What's happening when I use for(i in object) in AS3?

To iterate over the properties of an Object in AS3 you can use for(var i:String in object) like this:
Object:
var object:Object = {
thing: 1,
stuff: "hats",
another: new Sprite()
};
Loop:
for(var i:String in object)
{
trace(i + ": " + object[i]);
}
Result:
stuff: hats
thing: 1
another: [object Sprite]
The order in which the properties are selected however seems to vary and never matches anything that I can think of such as alphabetical property name, the order in which they were created, etc. In fact if I try it a few different times in different places, the order is completely different.
Is it possible to access the properties in a given order? What is happening here?
I'm posting this as an answer just to compliment BoltClock's answer with some extra insight by looking directly at the flash player source code. We can actually see the AVM code that specifically provides this functionality and it's written in C++. We can see inside ArrayObject.cpp the following code:
// Iterator support - for in, for each
Atom ArrayObject::nextName(int index)
{
AvmAssert(index > 0);
int denseLength = (int)getDenseLength();
if (index <= denseLength)
{
AvmCore *core = this->core();
return core->intToAtom(index-1);
}
else
{
return ScriptObject::nextName (index - denseLength);
}
}
As you can see when there is a legitimate property (object) to return, it is looked up from the ScriptObject class, specifically the nextName() method. If we look at those methods within ScriptObject.cpp:
Atom ScriptObject::nextName(int index)
{
AvmAssert(traits()->needsHashtable());
AvmAssert(index > 0);
InlineHashtable *ht = getTable();
if (uint32_t(index)-1 >= ht->getCapacity()/2)
return nullStringAtom;
const Atom* atoms = ht->getAtoms();
Atom m = ht->removeDontEnumMask(atoms[(index-1)<<1]);
if (AvmCore::isNullOrUndefined(m))
return nullStringAtom;
return m;
}
We can see that indeed, as people have pointed out here that the VM is using a hash table. However in these functions there is a specific index supplied, which would suggest, at first glance, that there must then be specific ordering.
If you dig deeper (I won't post all the code here) there are a whole slew of methods from different classes involved in the for in/for each functionality and one of them is the method ScriptObject::nextNameIndex() which basically pulls up the whole hash table and just starts providing indices to valid objects within the table and increments the original index supplied in the argument, so long as the next value points to a valid object. If I'm right in my interpretation, this would be the cause behind your random lookup and I don't believe there would be any way here to force a standardized/ordered map in these operations.
Sources
For those of you who might want to get the source code for the open source portion of the flash player, you can grab it from the following mercurial repositories (you can download a snapshop in zip like github so you don't have to install mercurial):
http://hg.mozilla.org/tamarin-central - This is the "stable" or "release" repository
http://hg.mozilla.org/tamarin-redux - This is the development branch. The most recent changes to the AVM will be found here. This includes the support for Android and such. Adobe is still updating and open sourcing these parts of the flash player, so it's good current and official stuff.
While I'm at it, this might be of interest as well: http://code.google.com/p/redtamarin/. It's a branched off (and rather mature) version of the AVM and can be used to write server-side actionscript. Neat stuff and has a ton of information that gives insight into the workings of the AVM so I thought I'd include it too.
This behavior is documented (emphasis mine):
The for..in loop iterates through the properties of an object, or the elements of an array. For example, you can use a for..in loop to iterate through the properties of a generic object (object properties are not kept in any particular order, so properties may appear in a seemingly random order)
How the properties are stored and retrieved appears to be an implementation detail, which isn't covered in the documentation. As ToddBFisher mentions in a comment, though, a data structure commonly used to implement associative arrays is the hash table. It's even mentioned in this page about associative arrays in AS3, and if you inspect the AVM code as shown by Ascension Systems, you'll find exactly such an implementation. As described, there is no notion of order or sorting in typical hash tables.
I don't believe there is a way to access the properties in a specific order unless you store that order somehow.

Is having a single massive class for all data storage OK?

I have created a class that I've been using as the storage for all listings in my applications. The class allows me to "sign" an object to a listing (which can be created on the fly via the sign() method like so):
manager.sign(myObject, "someList");
This stores the index of the element (using it's unique id) in the newly created or previously created listing "someList" as well as the object in a 2D array. So for example, I might end up with this:
trace(_indexes["someList"][objectId]); // 0 - the object is the first in this list
trace(_instances["someList"]); // [object MyObject]
The class has another two methods:
find(signature:String):Array
This method returns an array via slice() containing all of the elements signed with the given signature.
findFirst(signature:String):Object
This method just returns the first object in a given listing
So to retrieve myObject I can either go:
trace(find("someList")[0]); or trace(findFirst("someList"));
Finally, there is an unsign() function which will remove an object from a given listing. This function basically:
Stores the result of pop() in the specified listing against a variable.
Uses the stored index to quickly replace the specified object with the pop()'d item.
Deletes the stored index for the specified object and updates the index for the pop()'d item.
Through all this, using unsign() will remove an object extremely quickly from a listing of any size.
Now this is all well and good, but I've had some thoughts which are making me consider how good this really is? I mean being able to easily list, remove and access lists of anything I want throughout the application like this is awesome - but is there a catch?
A couple of starting thoughts I have had are:
So far I haven't implemented support for listings that are private and only accessible via a given class.
Memory - this doesn't seem very memory efficient. Then again, neither is creating arrays for everything I want to store individually either. Just seems.. Larger.. Somehow.
Any insights?
I've uploaded the class here in case the above doesn't make much sense: https://projectavian.com/AviManager.as
Your solution seems pretty solid. If you're looking to modify it to be a bit more extensible and handle rights management, you might consider moving all those individually indexed properties to a value object for your AV elements. You could perform operations like "sign" and "unsign" internally in the VOs, or check for access rights. Your management class could monitor the collection of these VOs, pass them around, perform the method calls, and the objects would hold the state in a bit more readable format.
Really, though, this is entering into a coding style discussion. Your method works and it's not particularly inefficient. Just make sure the code is readable, encapsulated, and extensible and you're good.

temporary variables

I would like to know what's the difference when I write a temporary variable like this (these are just examples):
version1
for each tempEnemy in enemyManager.enemies {
var tempX:int = tempEnemy.x;
}
or this:
version2
for each tempEnemy in enemyManager.enemies {
tempEnemy.oldX = tempEnemy.x;
}
What's wrong and right? Currently I write it like version 2 and I'm not sure if I should change it to version 1. Can someone help me out with this please? I know most developers write like version 1 but I'm a little bit confused because I am totally unaware about version 1. If I use version 1 does that mean that my value is stored explicitly in a temporary variable that is cleared in every cycle?
Also...
Adding the :variableType (int, String, Number etc) aids in code hinting and debugging.
In version 1 the declaration:
var tempX:int
defines a variable that lasts only as long as that iteration of the for (or for-each) loop it's in. Each iteration tempX is defined, given a value from an Enemy object, and at the end it is left for garbage collection.
In version 2, you reference two variables attached to an Enemy object referenced by the temporary variable named tempEnemy.
In both versions the reference to the Enemy object, tempEnemy, is reassigned the next iteration's Enemy object.
Each method has its advantages. From a memory standpoint, version 2 is probably better, since it changes an existing variable over and over rather than creating a new variable that's discarded at the end of each iteration. On the other hand, version 1 doesn't require you to have oldX defined in its class variables, which can often get mucky enough without these sorts of variables.
Best practices with code are based off of (a) working with other programmers, who need to be able to read and understand the code, and (b) leaving a project and coming back to it later, where you'll need to be able to read and understand your own code. For short projects you don't plan on sharing, version 2 is okay (and probably more memory-efficient), but any large project should use something more like version 1.
Another consideration is, are you going to use that variable anywhere other than the function where it is defined(set)? If not, you don't need to store it in the object, which points again to version 1.

Organizing Notebooks & Saving Results in Mathematica

As of now I use 3 Notebook :
Functions
Where I have all the functions I created and call in the other Notebooks.
Transformation
Based on the original data, I compute transformations and add columns/List
When data is my raw data, I then call :
t1data : the result of the first transformation
t2data : the result of the second transformation
and so on,
I am yet at t20.
Display & Analysis
Using both the above I create Manipulate object that enable me to analyze the data.
Questions
Is there away to save the results of the Transformation Notebook such that t13data for example can be used in the Display & Analysis Notebooks without running all the previous computations (t1,t2,t3...t12) it is based on ?
Is there a way to use my Functions or transformed data without opening the corresponding Notebook ?
Does my separation strategy make sense at all ?
As of now I systematically open the 3 and have to run them all before being able to do anything, and it takes a while given my poor computing power and yet inefficient codes.
Saving variable states: can be done using DumpSave, Save or Put. Read back using Get or <<
You could make a package from your functions and read those back using Needs or <<
It's not something I usually do. I opt for a monolithic notebook containing everything (nicely layered with sections and subsections so that you can fold open or close) or for a package + slightly leaner analysis notebook depending on the weather and some other hidden variables.
Saving intermediate results
The native file format for Mathematica expressions is the .m file. This is human readable text format, and you can view the file in a text editor if you ever doubt what is, or is not being saved. You can load these files using Get. The shorthand form for Get is:
<< "filename.m"
Using Get will replace or refresh any existing assignments that are explicitly made in the .m file.
Saving intermediate results that are simple assignments (dat = ...) may be done with Put. The shorthand form for Put is:
dat >> "dat.m"
This saves only the assigned expression itself; to restore the definition you must use:
dat = << "dat.m"
See also PutAppend for appending data to a .m file as new results are created.
Saving results and function definitions that are complex assignments is done with Save. Examples of such assignments include:
f[x_] := subfunc[x, 2]
g[1] = "cat"
g[2] = "dog"
nCr = #!/(#2! (# - #2)!) &;
nPr = nCr[##] #2! &;
For the last example, the complexity is that nPr depends on nCr. Using Save it is sufficient to save only nPr to get a fully working definition of nPr: the definition of nCr will automatically be saved as well. The syntax is:
Save["nPr.m", nPr]
Using Save the assignments themselves are saved; to restore the definitions use:
<< "nPr.m" ;
Moving functions to a Package
In addition to Put and Save, or manual creation in a text editor, .m files may be generated automatically. This is done by creating a Notebook and setting Cell > Cell Properties > Initialization Cell on the cells that contain your function definitions. When you save the Notebook for the first time, Mathematica will ask if you want to create an Auto Save Package. Do so, and Mathematica will generate a .m file in parallel to the .nb file, containing the contents of all Initialization Cells in the Notebook. Further, it will update this .m file every time you save the Notebook, so you never need to manually update it.
Sine all Initialization Cells will be saved to the parallel .m file, I recommend using the Notebook only for the generation of this Package, and not also for the rest of your computations.
When managing functions, one must consider context. Not all functions should be global at all times. A series of related functions should often be kept in its own context which can then be easily exposed to or removed from $ContextPath. Further, a series of functions often rely on subfunctions that do not need to be called outside of the primary functions, therefore these subfunctions should not be global. All of this relates to Package creation. Incidentally, it also relates to the formatting of code, because knowing that not all subfunctions must be exposed as global gives one the freedom to move many subfunctions to the "top level" of the code, that is, outside of Module or other scoping constructs, without conflicting with global symbols.
Package creation is a complex topic. You should familiarize yourself with Begin, BeginPackage, End and EndPackage to better understand it, but here is a simple framework to get you started. You can follow it as a template for the time being.
This is an old definition I used before DeleteDuplicates existed:
BeginPackage["UU`"]
UnsortedUnion::usage = "UnsortedUnion works like Union, but doesn't \
return a sorted list. \nThis function is considerably slower than \
Union though."
Begin["`Private`"]
UnsortedUnion =
Module[{f}, f[y_] := (f[y] = Sequence[]; y); f /# Join###] &
End[]
EndPackage[]
Everything above goes in Initialization Cells. You can insert Text cells, Sections, or even other input cells without harming the generated Package: only the contents of the Initialization Cells will be exported.
BeginPackage defines the Context that your functions will belong to, and disables all non-System` definitions, preventing collisions. (There are ways to call other functions from your package, but that is better for another question).
By convention, a ::usage message is defined for each function that it to be accessible outside the package itself. This is not superfluous! While there are other methods, without this, you will not expose your function in the visible Context.
Next, you Begin a context that is for the package alone, conventionally "`Private`". After this point any symbols you define (that are not used outside of this Begin/End block) will not be exposed globally after the Package is loaded, and will therefore not collide with Global` symbols.
After your function definition(s), you close the block with End[]. You may use as many Begin/End blocks as you like, and I typically use a separate one for each function, though it is not required.
Finally, close with EndPackage[] to restore the environment to what it was before using BeginPackage.
After you save the Notebook and generate the .m package (let's say "mypackage.m"), you can load it with Get:
<< "mypackage.m"
Now, there will be a function UnsortedUnion in the Context UU` and it will be accessible globally.
You should also look into the functionality of Needs, but that is a little more advanced in my opinion, so I shall stop here.