Function Call: Labels into memory addresses - function

I have difficulty to understand the correct sequence of events.
When a program written in abstract language is compiled, it is translated into machine code.
Subsequently, only after the program has run, it is loaded into the ram, in the code segment.
At this point, each instruction in the program will be on a specific memory address.
When a function is called in assembly, the Call statement is typically followed by a label.
I assume this label will be replaced with the function's memory address by the compiler.
And this is where I absolutely can't understand.
If the instructions are loaded into memory only when the program is running, thus obtaining each instruction its own memory address, how does the compiler know the memory address to which the label corresponds?
If the function is not yet in memory, how can the program, compiled in binary code where the labels are nomore available, know the memory address, corrisponding to that label, where the function will be loaded at the moment of execution? I am a bit confused. Help me.

A program contains several "sections" (some are optional):
a section that holds code, usually referred to as the Text section
a section that holds initial values for mutable global data
a section that holds immutable constants, usually called rodata
a section that has a set of relocation records
A section is stored as a contiguous chunk or block of memory in the program file on disc.
The loader creates memory chunks and loads the code, data, rodata into those; a stack will have been created, depending on the os, either by the loader, but also possibly by the forking of the parent process that creates the child process.
Knowing the final addresses, the loader also processes the relocation records.  These relocations describe where in the text and data sections updates are needed for the final addresses of the sections loaded into memory.
The relocation mechanism is general purpose, as: code can refer to code, code can refer to data, data can refer to code, and data can refer to data.
A single relocation record describes a reference that needs to be updated.  Each record describes:
a referring source — at what offset in the text or data section to make an address update
a referring target — which section is being referred to: code or data
what kind of update to make (some architectures have complex instruction encodings)
Some updates are for ordinary pointers, while others are for instructions.  Instruction set architectures that have complex instruction offset/immediate encodings, like MIPS, RISC V, HP-PA, need to inform of the immediate encoding method.
Usually the referrer already has an offset, so the update is a matter of addition/summation of the base of the section being referred to, to the offset already in place at the referring source.
Other metadata in the program describes where to start, e.g. the initial the program counter, which would be as an offset into the text section.
Most processors today support (as fuz describes) position independent code (PIC).  This is typically done via pc-relative addressing.  The processor performs branches and calls within the single text section using pc-relative addressing modes, and thus, no relocation records are required for these instructions.
Dynamically loaded libraries add complexity since each DLL, and the main program to run, each have the format of a program, i.e. they each will have their own sections; each has its own text section.  The relocations will also be capable of describing references to symbol imports, supported by additional sections holding symbol names, imports, and exports.
Object files (compiler output, pre-linking) typically follow this format as well.  A single object file has these sections, with relocation records, symbol names, imports, exports.  The linker's job is to merge object files into a single program or larger object file.  During merge the linker resolves some relocations, but it cannot necessarily resolve all of them, so some may remain for the os loader to resolve.
Let's imagine that, on a system using PIC, there is a reference: a call (code-to-code), from one object file to another, and that the linker merges these object files.  There will be a relocation record in the caller that refers to an imported symbol name (and in the other object file, an export of a symbol defined as some offset with its text section).  Once the two object files' sections are merged (e.g. by simply concatenating them into one larger text section), the call there is now an intra-section reference, and the linker can compute the delta between the addresses of the caller and callee, and these will not change by future linking or loading.  The linker will adjust the offset/immediate in the call instruction with that delta, and, knowing this reference is now resolved, omits this relocation record in the merge.
For reference, see:
ELF (Executable and Linkable Format)
COFF (Common Object File Format)
Windows Portable Executable Format
Relocation
Object file
Executable
Position Independent Code
Addressing Mode

TL:DR: the distance from a call to its target is a link-time constant.
The .o object files you get from assembling asm have relocation records for symbols that aren't defined in that file.
When you link these .o files into an executable or library, the linker lays out the .text section from each .o into one big .text section for the executable and calculates the relative distance for each call to reach its target. It encodes that relative displacement right into the machine code for each call.
At run time no further relocations are needed: wherever the whole executable is loaded in memory, distances between instructions don't change. So no runtime relocations are ever needed for relative calls.
Related: Why are global variables in x86-64 accessed relative to the instruction pointer?

Related

MIPS functions and variables in stack

I have come in contact with MIPS-32, and I came with the question if a variable, for example $t0 declared having the value in one function can be altered by another and how this does have to do with stack, this is, the location of the variable in memory. Everything that I am talking is in assembly language. And more, I would like some examples concerning this use, this is, a function altering or not, a variable value of another function, and how this variable "survive" or not in terms of if the variable is given as a copy or a reference.
(I hope we can create an environment where conceptual question like that above can be explored more)
$t0 declared having the value in one function can be altered by another
$t0 is known as a call-clobbered register.  It is no different than the other registers as far as the hardware is concerned — being call clobbered vs. call preserved is an aspect of software convention, call the calling convention, which is a subset of an Application Binary Interface (ABI).
The calling convention, when followed, allows a function, F, to call another function, G, knowing only G's signature — name, parameters & their types, return type.  The function, F, would not have to also be changed if G changes, as long as both follow the convention.
Call clobbered doesn't mean it has to be clobbered, though, and when writing your own code you can use it any way you like (unless your coursework says to follow the MIPS32 calling convention, of course).
By the convention, a call-clobbered register can be used without worry: all you have to do use it is put a value into it!
Call preserved registers can also be used, if desired, but they should be presumed to be already in use by some caller (maybe not the immediate caller, but some distant caller), the values they contain must be restored before exiting the function back to return to its caller.  This is, of course, only possible by saving the original value before repurposing the register for a new use.
The two sets of register (call clobbered/preserved) serve two common use cases, namely cheap temporary storage and long term variables.  The former requires no effort to preserve/restore, while the latter both does require this effort, though gives us registers that will survive a function call, which is useful, for example, when a loop has a function call.
The stack comes into play when we need to first preserve, then restore call-preserved registers.  If we want to use call-preserved registers for some reason, then we need to preserve their original values in order to restore them later.  The most reasonable way to do that is to save them in the stack.  In order to do that we allocate some space from the stack.
To allocate some local memory, the stack pointer is decremented to reserve a function some space.  Since the stack pointer, upon return to caller, must have the same value, this space is necessarily deallocated upon return.  Hence the stack is great for local storage.  Original values of preserved registers must be also restored upon return to caller and so using local storage is appropriate.
https://www.dyncall.org/docs/manual/manualse11.html — search for section "MIPS32".
Let's also make the distinction between variables, a logical concept, and storage, a physical concept.
In high level language, variables are named and have scopes (limited lifetimes).  In machine code, we have physical hardware (storage) resources of registers and memory; these simply exist: they have no concept of lifetime.  In and of themselves these hardware resources are not variables, but places that we can use to hold variables for their lifetime/scope.
As assembly language programmers, we keep a mental (or even written) map of our logical variables to physical resources.  The compiler does the same, knowing the scope/lifetime of program variables and creating that "mental" map of variables to machine code storage.  Variables that have overlapping lifetimes cannot share the same hardware resource, of course, but when a variable is out of scope, its (mapped-to) physical resource can be reused for another purpose.
Logical variables can also move around to different physical resources.  A logical variable that is a parameter, may be passed in a CPU register, e.g. $a0, but then be moved into an $s register or into a (stack) memory location.  Such is the business of machine code.
To allocate some hardware storage to a high level language (or pseudo code) variable, we simply initialize the storage!  Hardware resources are necessarily constantly being repurposed to hold a different logical variable.
See also:
How a recursive function works in MIPS? — for discussion on variable analysis.
Mips/assembly language exponentiation recursivley
What's the difference between caller-saved and callee-saved in RISC-V

Replacing a constant with a different value determined by the backend

Consider an architecture (namely, the Infocom Z-machine) with a large, readonly memory region (called "high memory") that is only intended to store strings (and machine code, but that doesn't pose a problem). This region can only be accessed by certain instructions that display text. Of course, this means that pointers to high memory can't be dereferenced.
I'd like to write an LLVM backend for this architecture. In order to do this, I need a way to tell the backend to store certain strings in high memory, and to obtain the "packed addresses" of said strings (also to convert the strings to the Z-Machine string encoding, but that's not the point).
Ideally, I'd be able to define a C function-like macro HIGHMEM_STRING which would take a string literal and expand to an integer constant. Supposing there's a function void print_paddr(uint16_t paddr), I'd like to be able to do:
print_paddr(HIGHMEM_STRING("It is pitch black. You are likely to be eaten by a grue."));
And then the backend would know to place the string in high memory and pass its packed address to print_paddr as a parameter.
My question has three parts:
Can I implement such a macro using LLVM intrinsics, or an asm block with a special directive for the backend, or some other similar way without having to fork Clang? Otherwise, what would I have to change in Clang?
How can I annotate the LLVM IR to convey to the backend that a string should be placed in high memory and replaced with its packed address?
If HIGHMEM_STRING is too hard, or impossible, to implement as a macro, what are the alternatives?
The Hexagon backend does something similar by storing information in a special section whos base address is loaded in the GP register and the referencing instruction has an offset inside that section. Look for CONST64 to get an idea of how these are processed.
Basically when we identify the data in LLVM IR we want to put in this special section, we create a pseudo instruction with the data. When we are writing out the ELF file we switch sections to the GP-rel section, emit the data, then switch back to the text section and emit the instruction to dereference this symbol.
It'll probably be easier if you can identify these strings based on their contents rather than having the user specify them in the program text.

What are the advantages of inline?

I've been reading a bit about how to use the inline metadata when using the ASC 2.0 compiler.
However I can't find any source of info explaining why I should use them.
Anyone knows?.
Functions induce overhead in any programming language. Per ActionScript, when function execution begins, a number of objects and properties are created.
First, an activation object is created that stores the parameters and local variables declared in a function body. It's an internal mechanism that cannot be directly accessed.
Second, a scope chain is created that contains an ordered list of objects that Flash platform checks for identifier declarations. Every function that executes has a scope chain that is stored in an internal property.
Function closures maintain a snapshot of a function and its lexical environment.
Moving code inline reduces the creation of these objects, and how references are maintained on the stack. Per Flash, you may see 4x performance increase.
Of course, there are tradeoffs - without the inline keyword induces code complexity; as well, inlining code increasing the amount of bytecode. Besides larger applications, the virtual machine spends additional time verifying and JIT compiling.
To simplify, inline is some sort of copy/paste of code. Since method calls are expensive and cost execution time, using inline keyword will copy/paste the body of the method each time the method call is present in your code so the method call will be replaced by the body of the method instead. Since this is done at compilation time it will increase in theory the size of the resulting app (if an inline method is called 10 times its body will be copied and pasted 10 times) but since all calls will be replaced you will gain speed execution. This is of course only relevant for demanding code execution like loops running at each frame for example.

cudaMemcpy() vs cudaMemcpyFromSymbol()

I'm trying to figure out why cudaMemcpyFromSymbol() exists. It seems everything that 'symbol' func can do, the nonSymbol cmds can do.
The symbol func appears to make it easy for part of an array or index to be moved, but this could just as easily be done with the nonSymbol function. I suspect the nonSymbol approach will run faster as there is no symbol-lookup needed. (It is not clear if the symbol look up calculation is done at compile or run time.)
Why would I use cudaMemcpyFromSymbol() vs cudaMemcpy()?
cudaMemcpyFromSymbol is the canonical way to copy from any statically defined variable in device memory.
cudaMemcpy can't be directly use to copy to or from a statically defined device variable because it requires a device pointer, and that isn't known to host code at runtime. Therefore, an API call which can interrogate the device context symbol table is required. The two choices are either, cudaMemcpyFromSymbol which does the symbol lookup and copy in one operation, or cudaGetSymbolAddress which returns an address which can be passed to cudaMemcpy. The former is probably more efficient if you only want to do one copy, the latter if you want to use the address multiple times in host code.

What does executable file actually contain?

What does executable actually contain ? .. Does it contain instructions to processor in the form of Opcode and Operands ? If so why we have different executables for different operating systems ?
Processors understand programs in terms of opcodes - so your intution about executables containing opcodes is correct, and you guessed correctly that any executable has to have opcodes and operands for executing the program on a processor.
However, programs mostly execute with the help of operating systems (you can write programs which do not use an OS to execute, but that would be a lot of unnecessary work) - which provide abstractions on top of the hardware which the programs can use. The OS is responsible for setting up a "context" for any program to run i.e. provide the program the memory it needs, provide general purpose libraries which the program can use for doing common stuff such as write to files, print to console etc.
However, to set up the context for the program (provide it memory, load its data, set up a stack for it), the OS needs to read a program's executable file and needs to know a few things about the program such as the data which the program expects to use, size of that data, the initial values stored in that data region, the list of opcodes that make up the program (also called the text region of a process), their size etc. All of this data and a lot more (debugging information, readonly data such as hardcoded strings in the program, symbol tables etc) is stored within the executable file. Each OS understands a different format of this executable file, since they expect all this info to be stored in the executable in different ways. Check out the links provided by Groo.
A couple of formats that have been used for storing information in an executable file are ELF and COFF on UNIX systems and PE on Windows.
P.S. - Not all programs need executable formats. Look up bootloaders on Google. These are special programs which occupy the first sector of a bootable partition on the hard-disk and are used to load the OS itself.
Yes, code in the form of opcodes and operands, and data of course. Anything you want to do that involves the operating system in any way depends on the operating system, not on the CPU. That is why you need different programs for different operating systems. Opening a window in Windows is not done with the same sequence of instructions as in Linux, and so on.
As unwind implied in his answer, an executable file contains calls to routines in the Operating System.
It would be extremely inefficient for an executable file to try to implement functions already provided by the OS (for example, writing to disk, accepting input) so heavy use is made of calls to the OS functions.
Different Operating Systems provide functions which do similar things, but the details of how to call those functions (and where they are) may be different.
So, apart from the major differences of processor type, executables written for one OS won't work with another.
To do any form of IO, an executable needs to interface with the Operating System using sys-calls. in Windows these are calls to the Win32 API and on linux/unit these are mostly posix calls.
Furthermore, the executable file format differs with the OS the same way a PNG file differs from a GIF file. the data is ordered differently and there are different headers and sub-headers.
An Executable file contains several blobs of data and instructions on how the datas should be loaded into memory. Some of these sections happen to contain machine code that can be executed. Other sections contain program data, resources, relocation information, import information etc.