Pro and Cons of ISA Extensions on RISC-V cores - hardware-acceleration

Because RISC-V is an open-source architecture, one can come up with ISA extensions for dedicated purposes (e.g. bit manipulation, or cryptographic implementations).
While I understand that such extensions allow to speed up the calculations for a particular function, it is not clear to me what the drawbacks are.
In which case should I prefer a software implementation over an ISA extension if one that fits my needs is available? What are the additional costs to be taken into consideration when integrating several ISA extensions into a RISC-V core?

Related

What is real difference between Firmware and Embedded Software

I am searching real difference between firmware and embedded software.
On the internet it is written for firmware is firmware is a type of embedded software but not vice versa. In addition to that a classic BIOS example it is very old.
They both run in non-volatile memory. One difference is Embedded software like an application programming that has an rtos and file system and can be run on RAM.
If i dont use rtos and RAM and only uses flash memory it means my embedded software is a firmware, it is true?
What actually makes real difference its memory layout.
The answers on the internet are lack of technical explanations and not satisfied.
Thank you very much.
They are not distinctly separate things, or even well defined. Firmware is a subset of software; the term typically implies that it is in read-only memory:
Software refers to any machine executable code - including "firmware".
Firmware refers to software in read-only memory
Read-only memory in this context includes re-writable memory such as flash or EPROM that requires a specific erase/write operation and is not simply random-access writable.
The distinction between RAM and ROM execution is not really a distinction between firmware and software. Many embedded systems load executable code from ROM and execute from RAM for performance reasons, while others execute directly from ROM. Rather if the end-user cannot easily modify or replace the software without special tools or a bootloader, then it might be regarded as "firm". If on the other hand a normal end-user can modify, update or replace the software using facilities on the system itself (by copying a file from removable media or network for example), then it is not firmware. Consider the difference in operation for example in updating your PC's BIOS and updating Microsoft Office - the former requires a special procedure distinct from normal operating system services for loading and running software.
For example, the operating system, bootloader and BIOS of a smart phone might be considered firmware. The apps a user loads from an app-store are certainly not firmware.
In other contexts "firmware" might refer to the configuration of a programmable logic device such as an FPGA as opposed to sequentially executed processor instructions. But that is rather a niche distinction, but useful in systems employing both programmable logic and software execution.
Ultimately you would use the term "firmware" to imply some level of "permanence" of software in a system, but there is a spectrum, so you would use the term in whatever manner is useful in the context of your particular system. For example, I am working on a system where all the code runs from flash, so only ever use the term software to refer to it because there is no need to distinguish it from any other kind of software in the system.

What type of machine language do PCs generally run on

I've recently begun researching what it would take to program a JIT compiler. I've been studying on machine language, but I haven't been able to find what type of machine languages most standard PCs run on. I found this PDF which seems to explain a type of ML, but it says it's MIPS, which, after looking it up, seems to be some kind of old, videogame console/router machine language. So, my question is,
What machine language do most modern personal computers (i.e. laptops, desktops) run on?
Or, is it indeterminable? Are there many machine languages? Or maybe I'm wrong, and MIPS is standard?
The machine language used by a given processor is a function of its instruction-set architecture ("ISA").
Most desktop and laptop computers today running Microsoft Windows use "64-bit" processors implementing the "x86-64" ISA, such as those in Intel's "Core i5" and "Core i7" processor families. Commonly referred to as "x64", this is the 64-bit extension (created by AMD) for the original "IA-32" ISA (created by Intel).
Both "IA-32" and "x64" are examples of Complex Instruction Set Computing ("CISC") architectures. On the other hand, MIPS is an example of the much simpler Reduced Instruction Set Computing ("RISC") style of architectures.
When talking about JIT compilers, it is important to distinguish between the ISA of the virtual machine running the byte-code and the ISA of the underlying physical processor. Most virtual machines are based upon RISC architectures, because of their relative simplicity. However, most likely this VM-plus-JIT-compiler will be physically running on an x64-compatible CISC processor.

What is CUDA like? What is it for? What are the benefits? And how to start?

I am interested in developing under some new technology and I was thinking in trying out CUDA. Now... their documentation is too technical and doesn't provide the answers I'm looking for. Also, I'd like to hear those answers from people that've had some experience with CUDA already.
Basically my questions are those in the title:
What exactly IS CUDA? (is it a framework? Or an API? What?)
What is it for? (is there something more than just programming to the GPU?)
What is it like?
What are the benefits of programming against CUDA instead of programming to the CPU?
What is a good place to start programming with CUDA?
CUDA brings together several things:
Massively parallel hardware designed to run generic (non-graphic) code, with appropriate drivers for doing so.
A programming language based on C for programming said hardware, and an assembly language that other programming languages can use as a target.
A software development kit that includes libraries, various debugging, profiling and compiling tools, and bindings that let CPU-side programming languages invoke GPU-side code.
The point of CUDA is to write code that can run on compatible massively parallel SIMD architectures: this includes several GPU types as well as non-GPU hardware such as nVidia Tesla. Massively parallel hardware can run a significantly larger number of operations per second than the CPU, at a fairly similar financial cost, yielding performance improvements of 50× or more in situations that allow it.
One of the benefits of CUDA over the earlier methods is that a general-purpose language is available, instead of having to use pixel and vertex shaders to emulate general-purpose computers. That language is based on C with a few additional keywords and concepts, which makes it fairly easy for non-GPU programmers to pick up.
It's also a sign that nVidia is willing to support general-purpose parallelization on their hardware: it now sounds less like "hacking around with the GPU" and more like "using a vendor-supported technology", and that makes its adoption easier in presence of non-technical stakeholders.
To start using CUDA, download the SDK, read the manual (seriously, it's not that complicated if you already know C) and buy CUDA-compatible hardware (you can use the emulator at first, but performance being the ultimate point of this, it's better if you can actually try your code out)
(Disclaimer: I have only used CUDA for a semester project in 2008, so things might have changed since then.) CUDA is a development toolchain for creating programs that can run on nVidia GPUs, as well as an API for controlling such programs from the CPU.
The benefits of GPU programming vs. CPU programming is that for some highly parallelizable problems, you can gain massive speedups (about two orders of magnitude faster). However, many problems are difficult or impossible to formulate in a manner that makes them suitable for parallelization.
In one sense, CUDA is fairly straightforward, because you can use regular C to create the programs. However, in order to achieve good performance, a lot of things must be taken into account, including many low-level details of the Tesla GPU architecture.

What is ABI(Application Binary Interface)?

This is what wikipedia says:
In computer software, an application
binary interface (ABI) describes the
low-level interface between an
application (or any type of) program
and the operating system or another
application.
ABIs cover details such as data type,
size, and alignment; the calling
convention, which controls how
functions' arguments are passed and
return values retrieved; the system
call numbers and how an application
should make system calls to the
operating system; and in the case of a
complete operating system ABI, the
binary format of object files, program
libraries and so on. A complete ABI,
such as the Intel Binary Compatibility
Standard (iBCS), allows a program
from one operating system supporting
that ABI to run without modifications
on any other such system, provided
that necessary shared libraries are
present, and similar prerequisites are
fulfilled.
I guess that an ABI is a convention or standard, and compilers/linkers use this convention to produce object codes. Is that right? If so who made these conventions(companies or some organization)? What was it like when there was no ABIs? Is there documents about these ABIs that we can refer to?
You're correct about the definition of an ABI, up to a point. The classic example is the syscall interface in Linux (and other UNIXes).
They are a standard way for code to request the operating system to carry out certain duties.
As such, they're decided by the people that wrote the OS or, in the case where the syscalls have been added later, by whoever added them (in cases where the OS allows this). For example, the Linux syscall interface on x86 states that you load the syscall number into eax, with other parameters placed in ebx, ecx and so on, depending on the syscall you're making (eax).
Typically, it's not the compiler or linker which do the work of interfacing, rather it's the libraries provided for the language you're using.
Returning to Linux, the GNU C libraries contain code for fopen (for example) which eventually call the relevant syscall to perform the lower level tasks (syscall number 5, open). A list of the syscalls can be found in this PDF file.
Specification is more suitable term than convention, as convention is loose term for widely accepted practice whereas specification is well-defined.
You are right. The specification is made by standardization body. Take a look at POSIX specification which is supported by Windows and compiler/build tool-chains such as gcc assume OS's to adhere by it, and even Linux kernel partially (almost exactly) adheres to it.
Before ABIs? Even today, firmware is hand-crafted as new chips come along for set-top boxes and such other devices having embedded systems.
The documentation is digital logic content in the data-sheet for the chips to be programmed by assembly language and for higher-level language, the cross-compiler tool-chain documentation gives away the assumptions that should be part of ABI.
Well, the concept of ABI was presumably conceived to support the binary compatibility of your program on other operating systems and machine architectures. So, lets suppose that you wrote a program on some operating system distribution running on x86 architecture. Now, for a programmer the most important thing is that this program that you wrote on your machine should be able to run exactly the same on any other machine running on same or different architecture lets say for the sake of discussion that the other machine is running on i386 architecture and this is where the concept of ABI or Application Binary Interfaces comes in. As every machine architecture defines its own way in which the operating system kernal talks to the outside world i.e user-space programs, hence every architecture defines a different set of system calls, machine registers, how those registers are used, how are software interrupts handled by the kernal and so on. ABI is the thing that handles these things for you like compiling, linking, byte ordering and so on. System programmers have had hard luck defining a uniform ABI for same operating systems running on different architectures and that is why every machine architecture has its own and you need to compile your programs in order to confirm to the format those machines have.

Financial applications on GPGPU

I want to know what sort of financial applications can be implemented using a GPGPU. I'm aware of Option pricing/ Stock price estimation using Monte Carlo simulation on GPGPU using CUDA. Can someone enumerate the various possibilities of utilizing GPGPU for any application in Finance domain,
There are many financial applications that can be run on the GPU in various fields, including pricing and risk. There are some links from NVIDIA's Computational Finance page.
It's true that Monte Carlo is the most obvious starting point for many people. Monte Carlo is a very broad class of applications many of which are amenable to the GPU. Also many lattice based problems can be run on the GPU. Explicit finite difference methods run well and are simple to implement, many examples on NVIDIA's site as well as in the SDK, it's also used in Oil & Gas codes a lot so plenty of material. Implicit finite difference methods can also work well depending on the exact nature of the problem, Mike Giles has a 3D ADI solver on his site which also has other useful finance stuff.
GPUs are also good for linear algebra type problems, especially where you can leave the data on the GPU to do reasonable work. NVIDIA provide cuBLAS with the CUDA Toolkit and you can get cuLAPACK too.
Basically, anything that requires a lot of parallel mathematics to run. As you originally stated, Monte Carlo simultation of options that cannot be priced with closed-form solutions are excellent candidates. Anything that involves large matrixes and operations upon them will be ideal; after all, 3D graphics use alot of matrix mathematics.
Given that many trader desktops sometimes have 'workstation' class GPUs in order to drive several monitors, possibly with video feeds, limited 3D graphics (volatility surfaces, etc) it would make sense to run some of the pricing analytics on the GPU, rather than pushing the responsibility onto a compute grid; in my experience the compute grids are frequently struggling under the weight of EVERYONE in the bank trying to use them, and some of the grid computing products leave alot to be desired.
Outside of this particular problem, there's not a great deal more that can be easily achieved with GPUs, because the instruction set and pipelines are more limited in their functional scope compared to a regular CISC CPU.
The problem with adoption has been one of standardisation; NVidia had CUDA, ATI had Stream. Most banks have enough vendor lock-in to deal with without hooking their derivative analytics (which many regard as extremely sensitive IP) into a gfx card vendor's acceleration technology. I suppose with the availability of OpenCL as an open standard this may change.
F# is used a lot in finance, so you might check out these links
http://blogs.msdn.com/satnam_singh/archive/2009/12/15/gpgpu-and-x64-multicore-programming-with-accelerator-from-f.aspx
http://tomasp.net/blog/accelerator-intro.aspx
High-end GPUs are starting to offer ECC memory (a serious consideration for financial and, eh, military applications) and high-precision types.
But it really is all about Monte Carlo at the moment.
You can go to workshops on it, and from their descriptions see that it'll focus on Monte Carlo.
A good start would be probably to check NVIDIA's website:
CUDA's Finance Showcases
CUDA's Finance Tutorials
Using a GPU introduces limitations to architecture, deployment and maintenance of your app.
Think twice before you invest efforts in such solution.
E.g. if you're running in virtual environment, it would require all physical machines to have GPU hardware installed and a special vGPU hardware and software support + licenses.
What if you decide to host your service in the cloud (e.g. Azure, Amazon)?
In many cases it is worth building your architecture in advance to support scale out and be flexible and scalable (with some overhead of course) rather than scale up and squeeze as much as you can from your hardware.
Answering the complement of your question: anything that involves accounting can't be done on GPGPU (or binary floating point, for that matter)