Native Client inner/outer sandbox - google-chrome

I am dealing with the Chrome Native Client and have some difficulties in the following points:
As I understood so far, the first 64 KB of the 256MB Nacl segment are dedicated to the inner-sandbox. This inner sandbox contains the trampoline and the springboard which communicate from the trusted code to the untrusted and vice versa. When I am in this first 64 KB, can I jump to the middle of 32 byte instructions? for example, if I have a 32 byte instruction in the trampoline, can I jump from this instr to the middle (not 32 bytes aligned) of another 32 byte intruction in the trampoline? Do all the instructions in the trampiline and the springboard are also 32 byte aligned?
Can I combine several x86 instructions into one 32 bytes aligned Nacl instruction (for example, putting AND 0xffffffe0 %eax and JMP EAX in one 32 byte aligned Nacl instruction).
I understood that the service runtime is dealing with process creating, memory management etc and that it is accessed through the trampoline, how exactly the trampoline instruction accesses the service runtime? where the service runtime is located in the memory platform? when the service runtime finishes, can it access not 32-byte aligned instruction in the springboard?
What the actual duty of the outer sandbox? how does it monitor and filter the system calls? if there is a bug in the validator of the inner sandbox, in what cases it can catch illegal/malicious instruction?
Thank you all

I'm not 100% sure of the top of my head, but I would guess from looking just at the directory layout of the source that they are both part of the trusted service runtime code (they are in the src/trusted/service_runtime directory), and are therefore built with the system compiler and not subject to validation.
Yes, there is no limit on the number of instructions in a 32-byte bundle. The restriction is just that no instruction (or multi-instruction sandboxing sequence such as the one you mentioned for indirect jumps) may cross the bundle boundary. So in your example, both of those instructions would be required to be in the same bundle.
Again I'm a bit fuzzy on the details of how the trampolines work but when control transfers from the trampoline, it ends up in the service runtime, which is just ordinary machine code built according to the native ABIs for the OS. So the service runtime can use any system calls (at least any allowed by the outer sandbox) and can read or execute any part of the untrusted code.
The outer sandbox is, strictly speaking, a defense in depth (i.e. the inner sandbox is in theory sufficient to contain the untrusted code). It filters system calls in different ways on different OSes. In Chrome's embedding of NaCl, the outer sandbox is the same implementation as the Chrome sandbox used for the renderer and GPU processes.

Related

Where is hardware exception handling entry / exit code stored

I know this question seems very generic as it can depend on the platform,
but I understand with procedure / function calls, the assembler code to push return address on the stack and local variables etc. can be part of either the caller function or callee function.
When a hardware exception or interrupt occurs tho, the Program Counter will get the address of the exception handler via the exception table, but where is the actual code to store the state, return address etc. Or is this automatically done at the hardware level for interrupts and exceptions?
Thanks in advance
since you are asking about arm and you tagged microcontroller you might be talking about the arm7tdmi but are probably talking about one of the cortex-ms. these work differently than the full sized arm architecture. as documented in the architectural reference manual that is associated with these cores (the armv6-m or armv7-m depending on the core) it documents that the hardware conforms to the ABI, plus stuff for an interrupt. So the return address the psr and registers 0 through 4 plus some others are all put on the stack, which is unusual for an architecture to do. R14 instead of getting the return address gets an invalid address of a specific pattern which is all part of the architecture, unlike other processor ip, addresses spaces on the cortex-ms are encouraged or dictated by arm, that is why you see ram starts at 0x20000000 usually on these and flash is less than that, there are some exceptions where they place ram in the "executable" range pretending to be harvard when really modified harvard. This helps with the 0xFFFxxxxx link register return address, depending on the manual they either yada yada over the return address or they go into detail as to what the patterns you find mean.
likewise the address in the vector table is spelled out something like the first 16 are system/arm exceptions then interrupts follow after that where it can be up to 128 or 256 possible interrupts, but you have to look at the chip vendor (not arm) documentation for that to see how many they exposed and what is tied to what. if you are not using those interrupts you dont have to leave a huge hole in your flash for vectors, just use that flash for your program (so long as you insure you are never going to fire that exception or interrupt).
For function calls, which occur at well defined (synchronous) locations in the program, the compiler generates executable instructions to manage the return address, registers and local variables. These instructions are integrated with your function code. The details are hardware and compiler specific.
For a hardware exception or interrupt, which can occur at any location (asynchronous) in the program, managing the return address and registers is all done in hardware. The details are hardware specific.
Think about how a hardware exception/interrupt can occur at any point during the execution of a program. And then consider that if a hardware exception/interrupt required special instructions integrated into the executable code then those special instructions would have to be repeated everywhere throughout the program. That doesn't make sense. Hardware exception/interrupt management is handled in hardware.
The "code" isn't software at all; by definition the CPU has to do it itself internally because interrupts happen asynchronously. (Or for synchronous exceptions caused by instructions being executed, then the internal handling of that instruction is what effectively triggers it).
So it's microcode or hardwired logic inside the CPU that generates the stores of a return address on an exception, and does any other stuff that the architecture defines as happening as part of taking an exception / interrupt.
You might as well as where the code is that pushes a return address when the call instruction executes, on x86 for example where the call instruction pushes return info onto the stack instead of overwriting a link register (the way most RISCs do).

Creating Universal binaries for OpenCL Kernel for Intel GPU

We write OpenCL C code and clCreateProgramWithSource and use clGetProgramInfo to get the binary. This binary is then integrated to the product binary which uses clCreateProgramWithBinary when initializing it.
We create a .h file and include the same in the source file. The content of the .h file is the binary generated after compiling OpenCL C Kernel.
The issue with the above step is, the compatibility of the binary is expected to break with any minor/major change in OpenCL and it will most likely break across vendors. We need to generate the OpenCL Kernel binary for each vendor or OpenCL release.
It is possible to integrate the OpenCL Kernel binary in header form to the project. In this case, if the binary is incompatible, we will not be in position to replace the binary. In such cases, the project initialization fails.
Expected Solution
The OpenCL C source is proprietary to the company and cannot be shared with the customers.
Since the OpenCL Kernel binary is integrated with the project
library, we need to understand if it is possible to generate binary
which can re-organize itself while clCreateProgramWithBinary to fit
to the target platform.
If it is absolutely necessary to generate the binary once for each
vendor/OpenCL minor/major revision and store it to disk (which will
be done at end user’s machine), how can we protect the source which
proprietary to the company (is SPIR the only option)?
I already visited Universal binaries for OpenCL but it suggests that SPIR also takes long time in compilation and hence it might not be the solution I am looking for since the init time is also important.
In practice the Intel Gen binary format can change on driver changes for the same platform/hardware (e.g. for bug fix workarounds and performance improvements). Hence, the bits returned by clGetProgramInfo are only sure to work in clCreateProgramWithBinary on the same device x driver x etc... Sadly, this means that the binary path is a poor match for the intellectual property security problem.
SPIR sort of splits the difference as it would be hardware independent while still being harder to reverse engineer. If startup performance is somehow important, you can always try the clCreateProgramWithBinary path; just be able to fall back to SPIR should the binary load fail (meaning the driver changed or something).

Is non-aligned pinned memory allowed with CUDA?

Is passing a pointer to cudaHostRegister that's not page aligned allowed/portable? I'm asking because the simpleStream example does manual page-aligment, but I can't find this requirement in the documentation. Maybe it's a portability problem (similar to mlock() supporting non-aligned on linux, but POSIX does not in general)?
I changed to bandwidth test and using non-aligned, but registered memory performs the same as that returned by cudaHostAlloc. Since I use these pinned buffers for overlapping copies and computation, I'm also interested in whether non-alignment prevents that (so far I could not detect a performance loss).
All my tests were on x86-64 linux.
Maybe it's a portability problem (similar to mlock() supporting non-aligned on linux, but POSIX does not in general)?
Both Linux's mlock and Windows' VirtualLock will lock all pages containing a byte or more of the address range you want to lock, manual alignment is not needed. But as you noted, POSIX allows for an implementation to require the argument of mlock to be page-aligned. This is notably the case on OS X's mlock which will round up a page-unaligned address to the next page boundary, therefore not locking the entirety of the address range.
The documentation of cudaHostRegister makes no mention of any alignment constraint on its arguments. As such, a consumer of this API would be in right to expect that any concern of alignment on the underlying platform is the responsibility of cudaHostRegister, not the user. But without seeing the source of cudaHostRegister, it's impossible to tell if this is actually the case. As the sample is deliberately manually taking care of alignment, it is possible that cudaHostRegister doesn't have such transparent alignment-fixing functionality.
Therefore, yes, it is likely the sample was written to ensure its portability across OSes supported by CUDA (Windows, Linux, Mac OS X).
I just found the following lines in the old 4.0 NVIDIA Library... Maybe it can be helpful for future questions:
The CUDA context must have been created with the cudaMapHost flag in order for the cudaHostRegisterMapped flag to have any effect.
The cudaHostRegisterMapped flag may be specified on CUDA contexts for devices that do not support mapped pinned memory. The failure is deferred to cudaHostGetDevicePointer() because the memory may be mapped into other CUDA contexts via the cudaHostRegisterPortable flag.
and finally
The pointer ptr and size size must be aligned to the host page size (4 KB).
so it is about the host page size.

Memory access exception handling with MinGW on XP

I am trying to use the MinGW GCC toolchain on XP with some vendor code from an embedded project that accesses high memory (>0xFFFF0000) which is, I believe, beyond the virtual mem address space allowed in 'civilian' processes in XP.
I want to handle the memory access exceptions myself in some way that will permit execution to continue at the instruction following the exception, ie ignore it. Is there some way to do it with MinGW? Or with MS toolchain?
The vastly simplified picture is thus:
/////////////
// MyFile.c
MyFunc(){
VendorFunc_A();
}
/////////////////
// VendorFile.c
VendorFunc_A(){
VendorFunc_DoSomeDesirableSideEffect();
VendorFunc_B();
VendorFunc_DoSomeMoreGoodStuff();
}
VendorFunc_B(){
int *pHW_Reg = 0xFFFF0000;
*pHW_Reg = 1; // Mem Access EXCEPTION HERE
return(0); // I want to continue here
}
More detail:
I am developing an embedded project on an Atmel AVR32 platform with freeRTOS using the AVR32-gcc toolchain. It is desirable to develop/debug high level application code independent of the hardware (and the slow avr32 simulator). Various gcc, makefile and macro tricks permit me to build my Avr32/freeRTOS project in the MinGW/Win32 freeRTOS port enviroment and I can debug in eclipse/gdb. But the high-mem HW access in the (vendor supplied) Avr32 code crashes the MinGW exe (due to the mem access exception).
I am contemplating some combination of these approaches:
1) Manage the access exceptions in SW. Ideally I'd be creating a kind of HW simulator but that'd be difficult and involve some gnarly assembly code, I think. Alot of the exceptions can likely just be ignored.
2) Creating a modified copy of the Avr32 header files so as to relocate the HW register #defines into user process address space (and create some structs and linker sections that commmit those areas of virtual memory space)
3) Conditional compilation of function calls that result in highMem/HW access, or alernatively more macro tricks, so as to minimize code cruft in the 'real' HW target code. (There are other developers on this project.)
Any suggestions or helpful links would be appreciated.
This page is on the right track, but seems overly complicated, and is C++ which I'd like to avoid. But I may try it yet, absent other suggestions.
http://www.programmingunlimited.net/siteexec/content.cgi?page=mingw-seh
You need to figure out why the vendor code wants to write 1 to address 0xFFFF0000 in the first place, and then write a custom VendorFunc_B() function that emulates this behavior. It is likely that 0xFFFF0000 is a hardware register that will do something special when written to (eg. change baud rate on a serial port or power up the laser or ...). When you know what will happen when you write to this register on the target hardware, you can rewrite the vendor code to do something appropriate in the windows code (eg. write the string "Starting laser" to a log file). It is safe to assume that writing 1 to address 0xFFFF0000 on Windows XP will not be the right thing to do, and the Windows XP memory protection system detects this and terminates your program.
I had a similar issue recently, and this is the solution i settled on:
Trap memory accesses inside a standard executable built with MinGW
First of all, you need to find a way to remap those address ranges (maybe some undef/define combos) to some usable memory. If you can't do this, maybe you can hook through a seg-fault and handle the write yourself.
I also use this to "simulate" some specific HW behavior inside a single executable, for some already written code. However, in my case, i found a way to redefine early all the register access macros.

What is ABI(Application Binary Interface)?

This is what wikipedia says:
In computer software, an application
binary interface (ABI) describes the
low-level interface between an
application (or any type of) program
and the operating system or another
application.
ABIs cover details such as data type,
size, and alignment; the calling
convention, which controls how
functions' arguments are passed and
return values retrieved; the system
call numbers and how an application
should make system calls to the
operating system; and in the case of a
complete operating system ABI, the
binary format of object files, program
libraries and so on. A complete ABI,
such as the Intel Binary Compatibility
Standard (iBCS), allows a program
from one operating system supporting
that ABI to run without modifications
on any other such system, provided
that necessary shared libraries are
present, and similar prerequisites are
fulfilled.
I guess that an ABI is a convention or standard, and compilers/linkers use this convention to produce object codes. Is that right? If so who made these conventions(companies or some organization)? What was it like when there was no ABIs? Is there documents about these ABIs that we can refer to?
You're correct about the definition of an ABI, up to a point. The classic example is the syscall interface in Linux (and other UNIXes).
They are a standard way for code to request the operating system to carry out certain duties.
As such, they're decided by the people that wrote the OS or, in the case where the syscalls have been added later, by whoever added them (in cases where the OS allows this). For example, the Linux syscall interface on x86 states that you load the syscall number into eax, with other parameters placed in ebx, ecx and so on, depending on the syscall you're making (eax).
Typically, it's not the compiler or linker which do the work of interfacing, rather it's the libraries provided for the language you're using.
Returning to Linux, the GNU C libraries contain code for fopen (for example) which eventually call the relevant syscall to perform the lower level tasks (syscall number 5, open). A list of the syscalls can be found in this PDF file.
Specification is more suitable term than convention, as convention is loose term for widely accepted practice whereas specification is well-defined.
You are right. The specification is made by standardization body. Take a look at POSIX specification which is supported by Windows and compiler/build tool-chains such as gcc assume OS's to adhere by it, and even Linux kernel partially (almost exactly) adheres to it.
Before ABIs? Even today, firmware is hand-crafted as new chips come along for set-top boxes and such other devices having embedded systems.
The documentation is digital logic content in the data-sheet for the chips to be programmed by assembly language and for higher-level language, the cross-compiler tool-chain documentation gives away the assumptions that should be part of ABI.
Well, the concept of ABI was presumably conceived to support the binary compatibility of your program on other operating systems and machine architectures. So, lets suppose that you wrote a program on some operating system distribution running on x86 architecture. Now, for a programmer the most important thing is that this program that you wrote on your machine should be able to run exactly the same on any other machine running on same or different architecture lets say for the sake of discussion that the other machine is running on i386 architecture and this is where the concept of ABI or Application Binary Interfaces comes in. As every machine architecture defines its own way in which the operating system kernal talks to the outside world i.e user-space programs, hence every architecture defines a different set of system calls, machine registers, how those registers are used, how are software interrupts handled by the kernal and so on. ABI is the thing that handles these things for you like compiling, linking, byte ordering and so on. System programmers have had hard luck defining a uniform ABI for same operating systems running on different architectures and that is why every machine architecture has its own and you need to compile your programs in order to confirm to the format those machines have.