I'm currently working on a Mips Code Generator for my Pascal Parser (Written in C using Lex / Yacc) . Does anybode know of a Tool out there I can use as a reference in order assure correct Code Generation?
Here is a mips simulator. I used it in school to check and run my mips projcets. One thing I remember is that this simulator has a few commands(to make it easier on us students) that real mips compilers don't. I am pretty sure it is all documented tho.
You can build the GNU Pascal Compiler for a MIPS target, as a cross-compiler.
Related
I am using the Hopper Disassembler to reverse engineer an iOS library. In principle, in the beginning, everything is clear and logical. But I can't find information about asm. What does asm mean? Is this a function call? If this is a function, what does it do? Thanks!
Disassembled code screenshot
It's inline assembly, equivalent to the asm keyword in C. Wikipedia might serve as an introduction.
You almost certainly get this because Hopper fails to properly decompile the instruction. In your case it's arm64 assembly (formally the A64 instruction set) as outlined in the ARMv8 Reference Manual.
Also, xN or wN in assembly should correspond to rN in the decompiled code, so you should be able to make some sense of the output you get.
How can I compile MPI/CUDA and UPC/CUDA hybrid code? Do I have to separately compile them or can I use language constructs interchangeably and compile as a single source file? Could someone with previous experience in this area help? Thanks in advance
MPI/CUDA - As JackOLantern has pointed out, can write MPI and CUDA code in separate files, compile them and link them.
For UPC, if it is Berkeley UPC, same procedure can be done but have to do a small change at the initial configuration. When defining the compiler parameters, have to provide NVCC as both C and C++ compilers.
I have code written in old-style Fortran 95 for combustion modelling. One of the features of this problem is that one have to solve stiff ODE system for taking into account chemical reactions influence. For this purpouse I use Fortran SLATEC library, which is also quite old. The solving procedure is straight forward, one just need to call subroutine ddriv3 in every cell of computational domain, so that looks something like that:
do i = 1,Number_of_cells ! Number of cells is about 2000
call ddriv3(...) ! All calls are independent on cell number i
end do
ddriv3 is quite complex and utilizes many other library functions.
Is there any way to get an advantage with CUDA Fortran, without searching some another library for this purpose? If I just run this as "parallel loop" is that will be efficient, or may be there is another way?
I'm sorry for such kind of question that immidiately arises the most obvious answer: "Why wouldn't you try and know it by yourself?", but i'm in a really straitened time conditions. I have no any experience in CUDA and I just want to choose the most right and easiest way to start.
Thanks in advance !
You won't be able to use or parallelize the ddriv3 call without some effort. Your usage of the phrase "parallel loop" suggests to me you may be thinking of using OpenACC directives with Fortran, as opposed to CUDA Fortran, but the general answer isn't any different in either case.
The ddriv3 call, being part of a Fortran library (which is presumably compiled for x86 usage) cannot be directly used in either CUDA Fortran (i.e. using CUDA GPU kernels within Fortran) or in OpenACC Fortran, for essentially the same reason: The library code is x86 code and cannot be used on the GPU.
Since presumably you may have access to the source implementation of ddriv3, you might be able to extract the source code, and work on creating a CUDA version of it (or a version that OpenACC won't choke on), but if it uses many other library routines, it may mean that you have to create CUDA (or direct Fortran source, for OpenACC) versions of each of those library calls as well. If you have no experience with CUDA, this might not be what you want to do (I don't know.) If you go down this path, it would certainly imply learning more about CUDA, or at least converting the library calls to direct Fortran source (for an OpenACC version).
For the above reasons, it might make sense to investigate whether a GPU library replacement (or something similar) might exist for the ddriv3 call (but you specifically excluded that option in your question.) There are certainly GPU libraries that can assist in solving ODE's.
This is a bit of silly question, but I'm wondering if CUDA uses an interpreter or a compiler?
I'm wondering because I'm not quite sure how CUDA manages to get source code to run on two cards with different compute capabilities.
From Wikipedia:
Programmers use 'C for CUDA' (C with Nvidia extensions and certain restrictions), compiled through a PathScale Open64 C compiler.
So, your answer is: it uses a compiler.
And to touch on the reason it can run on multiple cards (source):
CUDA C/C++ provides an abstraction, it's a means for you to express how you want your program to execute. The compiler generates PTX code which is also not hardware specific. At runtime the PTX is compiled for a specific target GPU - this is the responsibility of the driver which is updated every time a new GPU is released.
These official documents CUDA C Programming Guide and The CUDA Compiler Driver (NVCC) explain all the details about the compilation process.
From the second document:
nvcc mimics the behavior of the GNU compiler gcc: it accepts a range
of conventional compiler options, such as for defining macros and
include/library paths, and for steering the compilation process.
Not just limited to cuda , shaders in directx or opengl are also complied to some kind of byte code and converted to native code by the underlying driver.
I wonder, what kind of optimizations AVM2 (ActionScript 3 VM) support? I know it uses JIT but does it support Dead Code Elimination, constant folding, inlining, etc.
Also it's very interesting to me that ActionScript compiler also do some optimizations. AFAIK C# compiler does very small set of optimizations (only required for language support), JIT does all the work. And it works very fast.
Thanks.
Thanks to MPD. AVM2 supports:
Constant Folding
Copy & Constant Propagation
Common Subexpression Elimination (CSE)
Dead Code Elimination (DCE)
Take a look at these slides: ActionScript 3.0 and AVM2: Performance Tuning.
I don't think that the Flash/Flex compiler do most of this optimizations, but you can achieve this results with 3rd party softwares, like secureSWF (commercial).
Maybe you can find another tool that is free or Open Source that does this too.