Tell IDA Pro that a memory area contains a pointer table - reverse-engineering

I have a binary image for an embedded CPU where a memory area contains a number of pointers to entry points into the binary. This is an interrupt vector table in the binary used by the CPU. How can I hint to IDA what this memory is, so it can use the entry points for its analysis?

I'll assume you already have your IDB setup using the correct processor for the loaded binary image.
If the image file is a raw file (ie, without a header), you can define the low/high address suspiciousness limits under the Disassembly tab in Options->General.
With this set, you can set the first element in this vector to be an offset by either placing the text cursor on the first byte and pressing 'O' or 'Ctrl+O'. You can also do 'Ctrl+R' for a 'user-defined' offset (brings up a dialog with multiple options). All the various offsets can be viewed under Edit->Operand type->Offset->...
With the first element set, and with your text cursor on it, you can then hit the '*' key on your numpad to create an actual array (assuming you know how many elements are in the vector). This should apply the same operand type information to all the elements in the array. Since in this case the operand is an offset, IDA should (automatically) try and disassemble the bytes which are referenced.
Note: if an elements value falls outside the suspiciousness limit, it won't be turned into an offset
If this is a raw image, you may wish to setup some Segments info (Shift+F7) if you know the binary contains sections of pure code or pure data. I'm not sure off hand if the 'automatic disassembly' mentioned above is done only when a segment's class is defined as 'CODE' or if it even matters.
Note: you can always re-run analysis by pressing the colored circle icon in the toolbar (it should green, else IDA is busy doing something) or by clicking "Reanalyze program" in the Analysis tab in General->Options.

Related

How does the element section of wasm module in binary format looks?

I am reading this docs to study wasm binary format. I am finding it very tough to understand the composition of element section. Can someone please give me an example / explanation about it ? Maybe similar to one given here
The element segments section
The idea of this section is to fill the WebAssembly.Table objects with content. Initially there was only one table, and its only possible content were indexes/ids of functions. You could write:
(elem 0 (offset (i32.const 1)) 2)
It means: during the instantiation of the instance fill index 1 of table 0 with a value of 2, like tables[0][1] = 2;. Here 2 is the index of the function the table will store.
The type of element segment above is called active nowadays, and after the instantiation it will no longer be accessible by the application (they are "dropped"). From the specs:
An active element segment copies its elements into a table during instantiation, as specified by a table index and a constant expression defining an offset into that table
So far so good. But it was recognized that there is a need for a more powerful element segment section. Introduced were the passive and the declarative element segments.
The passive segment is not used during the instantiation and it is always available at runtime (until is not dropped, by the application itself, with elem.drop). There are instructions (from the Bulk memory and table instructions proposal, already integrated into the standard) that can be used to do operations with tables and element segments.
A declarative element segment is not available at runtime but merely serves to forward-declare references that are formed in code with instructions like ref.func.
Here is the test suite, where you can see many examples of element segments (in a text format).
The binary format
Assuming that you parser the code, you read one u32, and based on its value you expect the format from specification:
0 means an active segment, as the one above, for an implicit table index of 0, and a vector of func.refs.
1 means a passive segment, the elemkind (0x00 for func.ref at this time), followed by a vector of the respective items (func.refs).
2 is an active segment.
3 means a declarative segment.
4 is an active segment where the values in the vector are expressions, not just plain indexes (so you could have (i32.const 2) in the above example, instead of 2).
5 passive with expressions
6 active with table index and expressions
7 declarative with expressions
For this reason the specs says that from this u32 [0..7] you can use its three lower bits to detect what is the format that you have to parse. For example, the 3th bit signifies "is the vector made of expressions?".
Now, all that said, it seems that the reference types proposal is not (yet) fully integrated into the specification's binary format (but seems to be in the text one). When it is you will be able to have other then 0x00 (func.ref) for an elemkind.
It is visible that some of this formats overlap, but the specification evolves, and for backward compatibility reasons with the earliest versions the format is like this today.

MIPS: "lw" instruction path

From lecture notes:
from what I understand, doesn't the black dots indicate the binary bits is being copied to both paths moving forward?
For example, instruction bits I[20-16] should be going into the Read register 2 and Write register. It's just that in the end, the Read data 2 is not being used due to the MUX selecting 1.
Did my prof choose not to draw the red lines going to the Read register 2 path just to emphasize on the most significant path? Or does the black dot before Read register 2 have the ability to close off irrelevant data path?
Your professor is only highlighting the paths that affect the result. This is normal -- if every path that contained data were highlighted, everything would be red.

Octave force deepcopy

The question
What are the ways of coercing octave to create a real copy of whatever object? Structures are the main interest.
My underlying problem
In my problem I'm obtaining a rather large structure from another function in a loop but for the current task only a few pieces of it are needed. For example:
for i=1:many
res=solver(params);
store1{i}=res.string1;
store2{i}=res.arr(:,1);
end
res is a sizable chunk of data and due to lazy-copy those store-s are references to tiny portions of bytes in that chunk. After I store those tiny portions, I don't need res itself, however, since middle of that chunk is referenced by store, the memory area is unfit for res obtained on the next iteration (they are of the same size) and thus another sizable piece of memory is allocated, which is then again crossed by few tiny links an so on.
Without storing parts of res, the program successfully keeps the memory consumption same after first couple of iterations.
So how do I make a complete copy of structure field?
I've tried using struct-related functions like rmfield but those keep references instead of their own objects.
I've tried to wrap the assignment of in its own function:
new_struct=copy( rmfield(old_struct,"bigdata"));
function c=copy(a);
c=a;
end;
This by the way doesn't work even for arrays.
I'm interested in method applicable to any generic variable.
Minimal working example of the problem
a=cell(3,1);
for i=1:length(a);
r=rand(100000,1000);
a{i}=r(1:100,end);
whos; fflush(stdout);
pause(2);
end;
The above code will cause memory usage to gradually grow by far more than 8.08 kb reported by whos due to references stored by a{i} blocking bigger memory block than they actually need. If you force the proper copy, the problem is not present.
Numerical arrays
For numeric types addition of zero is enough to warrant a new array.
c=a+0;
Strings
For string which is 1 x n char array, something along the following lines will work:
c=[a "a"](1:end-1);
Multidimensional char arrays will require concatenation with a column:
c=[a true(size(a,1),1)](:,1:end-1);
Here true is used to generate dummy array of size compatible with char. (There seems to be no procedural method of generating char array of arbitrary size) char(zeros(size(a,1),1)) and char(true(size(a,1),1)) caused excess memory usage during their creation on some calls.
Note that empty concatenation c=[a ""]; will not result in a copying. Also it is possible to do c=[a+0 ""]; which will result in a copying due to +0 but that one infers type conversions to and from double which is 8 times larger in size. (char(zeros( doesn't seem to cause that)
Other types
In general you can use casting for the types allowed by it in order to not tailor the expressions manually as I had to do above:
typelist={"double","single","char"}; %full list of supported types is available in the link
class_of_a = typelist{ isa(a,typelist) };
c=typecast( [typecast(a,'single'); single(1)] (1:end-1), class_of_a);
Single is seemingly smallest datatype available in octave.
Note that logical is not supported by this method.
Copying structures
Apparently you'd have to write your own function to go around struct fields, copy them with above methods and recursively go to substructs.
(As it doesn't involve complexities relevant here, I'd rather leave that to be done by those who actually needs that, my own problem being solved by +0's.)

PIC Assembly: Calling functions with variables

So say I have a variable, which holds a song number. -> song_no
Depending upon the value of this variable, I wish to call a function.
Say I have many different functions:
Fcn1
....
Fcn2
....
Fcn3
So for example,
If song_no = 1, call Fcn1
If song_no = 2, call Fcn2
and so forth...
How would I do this?
you should have compare function in the instruction set (the post suggests you are looking for assembly solution), the result for that is usually set a True bit or set a value in a register. But you need to check the instruction set for that.
the code should look something like:
load(song_no, $R1)
cmpeq($1,R1) //result is in R3
jmpe Fcn1 //jump if equal
cmpeq ($2,R1)
jmpe Fcn2
....
Hope this helps
I'm not well acquainted with the pic, but these sort of things are usually implemented as a jump table. In short, put pointers to the target routines in an array and call/jump to the entry indexed by your song_no. You just need to calculate the address into the array somehow, so it is very efficient. No compares necessary.
To elaborate on Jens' reply the traditional way of doing on 12/14-bit PICs is the same way you would look up constant data from ROM, except instead of returning an number with RETLW you jump forward to the desired routine with GOTO. The actual jump into the jump table is performed by adding the offset to the program counter.
Something along these lines:
movlw high(table)
movwf PCLATH
movf song_no,w
addlw table
btfsc STATUS,C
incf PCLATH
addwf PCL
table:
goto fcn1
goto fcn2
goto fcn3
.
.
.
Unfortunately there are some subtleties here.
The PIC16 only has an eight-bit accumulator while the address space to jump into is 11-bits. Therefore both a directly writable low-byte (PCL) as well as a latched high-byte PCLATH register is available. The value in the latch is applied as MSB once the jump is taken.
The jump table may cross a page, hence the manual carry into PCLATH. Omit the BTFSC/INCF if you know the table will always stay within a 256-instruction page.
The ADDWF instruction will already have been read and be pointing at table when PCL is to be added to. Therefore a 0 offset jumps to the first table entry.
Unlike the PIC18 each GOTO instruction fits in a single 14-bit instruction word and PCL addresses instructions not bytes, so the offset should not be multiplied by two.
All things considered you're probably better off searching for general PIC16 tutorials. Any of these will clearly explain how data/jump tables work, not to mention begin with the basics of how to handle the chip. Frankly it is a particularly convoluted architecture and I would advice staying with the "free" hi-tech C compiler unless you particularly enjoy logic puzzles or desperately need the performance.

Programmatically obtaining the number of colors used in an image

Question:
Given an image in PNG format, what is the simplest way to programmatically obtain the number of colors used in the image?
Constraints:
The solution will be integreted into a shell script running under Linux, so any solution that fits in such an environment will do.
Please note that the "color capacity of the image file" does not necessarily correspond to "colors used". Example: In an image file with a theoretical color capacity of 256 colors only say 7 colors might be in actual use. I want to obtain the number of colors actually used.
Why write your own program?
If you're doing this with a shell script, you can use the netpbm utilities:
count = `pngtoppm png_file | ppmhist -noheader | wc -l`
The Image.getcolors method in Python Imaging Library seems to do exactly what you want.
Fun. There doesn't appear to be any guaranteed method of doing this; in the worst case you'll need to scan the image and interpret every pixel, in the best possible case the PNG will be using a palette and you can just check there.
Even in the palette case, though, you're not guaranteed that every entry is used -- so you're (at best) getting an upper bound.
http://www.libpng.org/pub/png/spec/1.1/PNG-Contents.html
.. and the chunk info here:
http://www.libpng.org/pub/png/spec/1.1/PNG-Chunks.html
Alnitak's solution is nice :) I really should get to know netpbm and imagemagick etc. better some time.
Just FYI, as a simple and very general solution: loop through each pixel in the image, getting the r,g,b color values as a single integer. Look for that integer in a list. If it's not there, add it. When finished with all the pixels, print the number of colors in the list.
If you want to count occurences, use a hashmap/dictionary instead of a simple list, incrementing the key's value (a counter) if found in the dictionary already. If not found, add it with a starting counter value of 1.