Hey there,
if the env var "XYZ" is set WHILE compiling, than I want the part:
write (STDOUT,*) "Compiled with XYZ"
here one more function call bla()
to be compiled into the binary. If not, than not.
Any way to do it?
Thanks a lot!
You can't check environment variables while compiling, but you can pass options to the compiler -- termed preprocessing. This isn't heavily documented, but works with at least gfortran and intel ifort. On the compile line use, or not, "-DMYOPTION" (or whatever option name you select). Then in the code:
#ifdef MYOPTION
Fortran source code
#else
Fortran source code
#endif
Apparently that the preprocessor lines must start in the first column.
If you use filetype "F90" the preprocessor will be automatically invoked, otherwise you can use a compiler option to invoke this step.
Maybe this will answer your need? If not, you could you a command script to check the environment variable and use different compile commands depending on its value, to make the preprocessor method respond to an environment variable.
Of course, you can check environment variables at run-time with the intrinsic get_environment_variable .. simply using if statements to respond to a value might be easier.
As part of the Fortran 2008 standard, there are intrinsic functions for finding the compiler options and version it was compiled with. compiler options and compiler version. Fortran compilers are slowly coming up to date with the new standard; gfortran has it, it doesn't look like ifort 12 does yet:
program compilerinfo
use iso_fortran_env
implicit none
print *,'This program was compiled with ', compiler_version()
print *,'with flags ', compiler_options()
end program compilerinfo
and running gives
$ ./compilerinfo
This program was compiled with GCC version 4.6.0
with flags -mtune=generic -march=x86-64
NoOnly very newest Fortran compilers provide such a feature.
The nearest mechanism would be to write a program which obtains the environment variable and writes a Fortran subroutine containing the information needed. Add to the project build:
Running the program to grab the environment variable and write the subroutine
Compile the subroutine
Link the object into the rest of the project.
Edited to reflect Fortran 2008+ compilers
Related
I am importing a library and I get this error when compiling:
go.cu(61): error: calling a __host__ function("TinyJS::Interpreter::Interpreter()") from a __global__ function("capnduk_kernel") is not allowed
...is there a way to port an entire file (TinyJS) to run on the device?
I've checked the compiler documentation, and it doesn't look like there's a way to do this. I'm guessing the only way is to rewrite the file by hand, which is a can of worms.
There isn't a way to do this with nvcc. It will require manual effort.
While NVCC does not support this (as Robert points out), this is an option for run-time compilation, via the NVRTC library:
Documentation lists the following compilation option:
--device-as-default-execution-space (-default-device)
Treat entities with no execution space annotation as __device__ entities.
Notes:
With this being the case, I would consider submitting a bug report to NVIDIA and asking them to add this option to NVCC.
clang++ supports compiling CUDA, perhaps it has such a flag.
This NVRTC is also supported by the Modern-C++ wrappers library for CUDA, which is more convenient to use than working with NVRTC directly. (Caveat: That's my own library.)
Chisel provides the printf function for debugging purposes, when generating verilog, it becomes fwrite system function. How to use verilog simulation to output data to a file instead of the terminal only when the chisel code is modified. Can chisel's printf function do this?
The short answer appears to be no. But there are two issues filed about this
Can chisel implement printf to a file? #1290
Feature Request: Allow printfs to go to different file descriptors #1287
It might be a good PR to submit.
My understanding of TCL execution is, if a command's compile function is defined, it is first called when it comes to execute the command before its execution function is called.
Take command append as example, here is its definition in tclBasic.c:
static CONST CmdInfo builtInCmds[] = {
{"append", (Tcl_CmdProc *) NULL, Tcl_AppendObjCmd,
TclCompileAppendCmd, 1},
Here is my testing script:
$ cat t.tcl
set l [list 1 2 3]
append l 4
I add gdb breakpoints at both functions, TclCompileAppendCmd and Tcl_AppendObjCmd. My expectation is TclCompileAppendCmd is hit before Tcl_AppendObjCmd.
Gdb's target is tclsh8.4 and argument is t.tcl.
What I see is interesting:
TclCompileAppendCmd does get hit first, but it is not from t.tcl,
rather it is from init.tcl.
TclCompileAppendCmd gets hit several times and they all are from init.tcl.
The first time t.tcl executes, it is Tcl_AppendObjCmd gets hit, not TclCompileAppendCmd.
I cannot make sense of it:
Why is the compile function called for init.tcl but not for t.tcl?
Each script should be independently compiled, i.e. the object with compiled command append at init.tcl is not reused for later scripts, isn't it?
[UPDATE]
Thanks Brad for the tip, after I move the script to a proc, I can see TclCompileAppendCmd is hit.
The compilation function (TclCompileAppendCmd in your example) is called by the bytecode compiler when it wants to issue bytecode for a particular instance of that particular command. The bytecode compiler also has a fallback if there is no compilation function for a command: it issues instructions to invoke the standard implementation (which would be Tcl_AppendObjCmd in this case; the NULL in the other field causes Tcl to generate a thunk in case someone really insists on using a particular API but you can ignore that). That's a useful behaviour, because it is how operations like I/O are handled; the overhead of calling a standard command implementation is pretty small by comparison with the overhead of doing disk or network I/O.
But when does the bytecode compiler run?
On one level, it runs whenever the rest of Tcl asks for it to be run. Simple! But that's not really helpful to you. More to the point, it runs whenever Tcl evaluates a script value in a Tcl_Obj that doesn't already have bytecode type (or if the saved bytecode indicates that it is for a different resolution context or different compilation epoch) except if the evaluation has asked to not be bytecode compiled by the flag TCL_EVAL_DIRECT to Tcl_EvalObjEx or Tcl_EvalEx (which is a convenient wrapper for Tcl_EvalObjEx). It's that flag which is causing you problems.
When is that flag used?
It's actually pretty simple: it's used when some code is believed to be going to be run only once because then the cost of compilation is larger than the cost of using the interpretation path. It's particularly used by Tk's bind command for running substituted script callbacks, but it is also used by source and the main code of tclsh (essentially anything using Tcl_FSEvalFileEx or its predecessors/wrappers Tcl_FSEvalFile and Tcl_EvalFile). I'm not 100% sure whether that's the right choice for a sourced context, but it is what happens now. However, there is a workaround that is (highly!) worthwhile if you're handling looping: you can put the code in a compiled context within that source using a procedure that you call immediately or use an apply (I recommend the latter these days). init.tcl uses these tricks, which is why you were seeing it compile things.
And no, we don't normally save compiled scripts between runs of Tcl. Our compiler is fast enough that that's not really worthwhile; the cost of verifying that the loaded compiled code is correct for the current interpreter is high enough that it's actually faster to recompile from the source code. Our current compiler is fast (I'm working on a slower one that generates enormously better code). There's a commercial tool suite from ActiveState (the Tcl Dev Kit) which includes an ahead-of-time compiler, but that's focused around shrouding code for the purposes of commercial deployment and not speed.
I am working with Capabilities 3.5, CUDA 5 and VS 2010 (and obviously Windows).
I am interested in reading the compiled code to understand better the implication of my C code changes.
What configuration do I need in VS to compile the code for readability (is setting the compilation to PTX enough?)?
What tool do I need to reverse engineer the generated PTX to be able to read it?
In general, to create a ptx version of a particular .cu file, the command is:
nvcc -ptx mycode.cu
which will generate a mycode.ptx file containing the ptx code corresponding to the file you used. It's probably instructive to use the -src-in-ptx option as well:
nvcc -ptx -src-in-ptx mycode.cu
Which will intersperse the lines of source code with the lines of ptx they correspond to.
To comprehend ptx, start with the documentation
Note that the compiler may generate ptx code that doesn't correspond to the source code very well, or is otherwise confusing, due to optimizations. You may wish (perhaps to gain insight) to compile some test cases using the -G switch as well, to see how the non-optimized version compares.
Since the windows environment may vary from machine to machine, I think it's easier if you just look at the path your particular version of msvc++ is using to invoke nvcc (look at the console output from one of your projects when you compile it) and prepend the commands I give above with that path. I'm not sure there's much utility in trying to build this directly into Visual Studio, unless you have a specific need to compile from ptx to an executable. There are also a few sample codes that have to do with ptx in some fashion.
Also note for completeness that ptx is not actually what's executed by the device (but generally pretty close). It is an intermediate code that can be re-targetted to devices within a family by nvcc or a portion of the compiler that also lives in the GPU driver. To see the actual code executed by the device, we use the executable instead of the source code, and the tool to extract the machine assembly code is:
cuobjdump -sass mycode.exe
Similar caveats about prepending an appropriate path, if needed. I would start with the ptx. I think for what you want to do, it's enough.
1)I want to extract ptx code from a CUDA exe and use that kernel code in another program .
Is there a way to identify the kernel ptx code from an exe. I know they are arbitrarily laid out in an exe file data section.
I learnt that in MAC executables the ptx kernels start with .version and ends with a null string. Is there something like that for win exe(PE) files. I guess i need to parse the exe file , gather ptx statements one at a time and group them together as kernels. But I dont know how i would go about it. some help would get me started. I also find a .nvFatBi section in Cuda exe. What is that supposed to be?
2)I also learnt that there are global constructors which register the cubin with the cuda runtime. I dont understand this part completely. Does the function cudaRegisterFatBinary come into play here. If so how can I use this ptx to supply the pointer to the cudaRegisterFatBinary ? I understand i have to compile the ptx to cubin file . is it possible programatically? In short i want to emulate the nvcc itself in some sense.
Try: cuobjdump --dump-ptx [executable-name]