I am developing/debugging a c++ code which extensively uses c++ STL vectors and blitz cpp arrays
(vectors/arrays are multidimensional, upto 4D/5D arrays)
I am currently using cout/print to log the outputs of inputs/outputs of functions but it is getting very tedious. To be able to print the vectors/arrays while debugging, can you suggest any options.
I thought of a couple of options
(a) write template functions on c++ to print and use GDBs "call" feature. but unable to use the "call" functionality of GDB for c++ template functions but works for normal functions though.
(b) Is it possible to pass c++ variables to python interface of GDB and print them ? any examples for the same ?
I googled before posting this question, but did not find any useful thread.
Any help is highly appreciated (even if some links can be provided)
Thanks a lot in advance !
Writing code in C++ to print the array and call it from gdb is certainly an option, but it might be unreliable because the print function you write might not be accessible (the linker might have dropped it because it was not used in your c++ code, for instance). Also, remember that templates are just "recipes" and you actually need to use them in order for the compiler to generate a class/function from it.
Is it possible to pass c++ variables to python interface of GDB and print them ? any examples for the same ?
A simple answer to this is "yes". You can use the parse_and_eval function in the gdb module when you use gdb's python API. Something such as
py print(gdb.parse_and_eval('your_variable'))
would print the value of a variable called your_variable using gdb's python API. But just that would be the same as just p your_variable in gdb's regular prompt without using the python API. The real power comes when you use gdb's python API to write pretty-printers for the types you want to debug.
A pretty-printer is basically just some code that you or someone else wrote to tell gdb how to print some type in a nice way. With a pretty-printer for a type just p your_variable in gdb's prompt prints the variable in the nice way defined by your pretty-printer.
I couldn't find a pretty-printer for blitz with a quick google search and I haven't used blitz before. However, I have used another library for vectors and matrices in scientific computing called armadillo and thus faced similar problems. I have thus written some pretty printers for armadillo here that might help you in case you decide to write pretty printers for blitz.
As an illustration, below you can see how the arma::mat (a matrix of doubles) type from armadillo is printed in gdb without a pretty printer (the m1 variable, which is a 6x3 matrix of doubles)
Notice that we can't even see the matrix elements. They are stored in a continuous memory region pointed by the mem attribute of the arma::mat object.
Now the same matrix with the pretty printer available here.
That makes debugging code a lot easier.
Note: You can also write pretty printers in the guile language, but I bet python is a much more common choice.
Specifically, my issue is that I have CUDA code that needs <curand_kernel.h> to run. This isn't included by default in NVRTC. Presumably then when creating the program context (i.e. the call to nvrtcCreateProgram), I have to send in the name of the file (curand_kernel.h) and also the source code of curand_kernel.h? I feel like I shouldn't have to do that.
It's hard to tell; I haven't managed to find an example from NVIDIA of someone needing standard CUDA files like this as a source, so I really don't understand what the syntax is. Some issues: curand_kernel.h also has includes... Do I have to do the same for each of these? I am not even sure the NVRTC compiler will even run correctly on curand_kernel.h, because there are some language features it doesn't support, aren't there?
Next: if you've sent in the source code of a header file to nvrtcCreateProgram, do I still have to #include it in the code to be executed / will it cause an error if I do so?
A link to example code that does this or something like it would be appreciated much more than a straightforward answer; I really haven't managed to find any.
You have to send the "filename" and the source of each header separately.
When the preprocessor does its thing, it'll use any #include filenames as a key to find the source for the header, based on the collection that you provide.
I suspect that, in this case, the compiler (driver) doesn't have file system access, so you have to give it the source in much the same way that you would for shader includes in OpenGL.
So:
Include your header's name when calling nvrtcCreateProgram. The compiler will, internally, generate the equivalent of a std::map<string,string> containing the source of each header indexed by the given name.
In your kernel source, use #include "foo.cuh" as usual.
The compiler will use foo.cuh as an index or key into its internal map (created when you called nvrtcCreateProgram), and will retrieve the header source from that collection
Compilation proceeds as normal.
One of the reasons that nvrtc provides only a "subset" of features is that the compiler plays in a somewhat sandboxed environment, without necessarily having all of the supporting tools and utilities lying around that you have with offline compilation. So, you have to manually handle a lot of the stuff that the normal nvcc + (gcc | MSVC| clang) combination provides.
A possible, but non-ideal, solution would be to preprocess the file that you need in your IDE, save the result and then #include that. However, I bet there is a better way to do that. if you just want curand, consider diving into the library and extracting the part you need (blech) or using another GPU-friendly rand implementation. On older CUDA versions, I just generated a big array of random floats on the host, uploaded it to the GPU, and sampled it in the kernels.
This related link may be helpful.
You do not need to load curand_kernel.h yourself and add it to the include "aliases" mechanism.
Instead, you can simply add the CUDA include directory to your (set of) include paths, e.g. by adding --include-path=/usr/local/cuda/include to your NVRTC compiler options.
(I do this in my GPU-kernel-runner test harness, by default, to be on the safe side.)
Essentially I want to rewrite a binary file to perform additional tasks regarding its actual tasks.
Regarding binary rewriting the process seems to be following:
Create a Control Flow Graph from an existing binary
Create a Code Snippet with the desired changes in an appropriate format
Create a binary file from the modified CFG
I came across a couple of tools, which either won't compile on my ubuntu 12.04, are not available for download or I can not find a decent tutorial / howto on how to hot patch / rewrite a binary. Those tools are:
ParseAPI, Code-Surfer/x86, EEL, LEEL, Jakstab, DynInst, Diablo + Lancet
To be more precise I want to analyze a given binary for its most frequently used functions and change it in such a way that before executing these functions, a given set of instructions are performed.
These instructions comprise of loading an array of stored bytes, reading a byte at a certain position and comparing it with a pre-defined value.
I want to make sure that the binary definitely executes these instructions during every trial.
There are 2 alternative approaches I came across which basically alter standard c functions (like memcpy(), strcpy(), printf(), etc.) since I assume these functions to be part of the binary with high probability:
LD_PRELOAD: Define my own libraries and let them get loaded before the ordinary ones
Compile the binary (of sourcecode is given) with own versions of the standard functions using something like gcc -fno-builtin -o strcpy strcpy.c
Drawback of this approach is that eventhough I subsitute standard c functions they do not necessarily have to get called, hence my instruction will not get executed neither.
Do you guys have experience regarding binary rewriting or do your have clues for accomplishing this rather exotic task?
Best regards!
BAP and Dyninst would help you. You may use BAP (http://bap.ece.cmu.edu/) to get the control flow graph of a binary. It have a very easy to use utility to create control flow graph from binaries. And you may use dyninst to instrument binaries and perform your desired operations. BAP absolutely runs on ubuntu12.04. Dyninst might not compile on 12.04 (there might be some linking problems). A simple walk around is that you do instrumentation on 10.04 and run the rewritten binaries on 12.04. Both tools are free.
I am a new one to Common Lisp (using Clozure Common Lisp under Microsoft Windows), who is familiar with c and python before. So maybe the questions are stupid here, but be patient to give me some help.
1) What's is the usual way to run a common lisp script?
Now, I wrote a bat file under windows to call ccl exe(wx86cl.exe) and evaluate (progn (load "my_script_full_path") (ccl:quit)) every time when I want to "run" my script. Is this a standard way to "run" a script for common lisp?
Any other suggestion about this?
2) What's the difference between (require 'cxml) and (asdf:operate 'asdf:load-op :cxml)?
They are seems to be the same for my script, which one should I use?
3) ignore it, not a clear question
4) When I want to load some library (such as require 'cxml), it always takes time(3s or even 5s) to load cxml every time when I "run" my script, there is also much log to standard output I show below, it seems like checking something internal. Does it means I have to spent 3-5s to load cxml every time when I want to run a simple test? It seems like a little inefficient and the output is noisy. Any suggestion?
My Script
(require 'cxml) (some-code-using-cxml)
And the output
; Loading system definition from D:/_play_/lispbox-0.7/quicklisp/dists/quicklisp/software/cxml-20101107-git/cxml.asd into #<Package "ASDF0">
;;; Checking for wide character support... yes, using code points.
; Registering #<SYSTEM "cxml-xml">
......
some my script output
---EDIT TO ADD MORE----
5) I must say that I almost forget the way of dumping image to accelerate the loading speed of lisp library. So, what is the normal process for us to develop a (maybe very simple) lisp script?
Base on the answer of what I got now, I guess maybe
a) edit your script
b) test it via a REPL environment, SLIME is a really good choice, and there should be many loop between a <==> b
c) dump the image to distribute it?( I am no sure about this)
6) Furthermore, what is the common way/form for us to release/distribute the final program?
For a lisp library, we just release our source code, and let someone else can "load/require" them.
For a lisp program, we dump a image to distribute it when we confirm that all functions go well.
Am I right?
What form do we use in a real product? Do we always dump all the thing into a image at final to speed up the loading speed?
1) Yes, the normal way to run a whole programme is to use a launcher script. However, windows has much, much better scripting support these days than just the bat interpreter. Windows Scripting Host and PowerShell ship as standard.
1a) During development, it is usual to simply type things in a the REPL (Read-Eval-Print-Loop, i.e. the lisp command line), or to use something like SLIME (for emacs or xemacs) as a development environment. If you don't know what they are, look them up. You may wish to use Cygwin to install xemacs, which will give you access to a range of linux-ish tools.
2) Require is, IIRC, a part of the standard. ASDF is technically not, it is a library that operates to make libraries work more conveniently. ASDF has a bunch of features that you will eventually want if you really get into writing large Lisp programmes.
3) Question unclear, pass.
4) See 1a) - do your tests and modifications in a running instance, thus avoiding the need to load the library more than once (just as you would in Python - you found the python repl, right?). In addition, when your programme is complete, you can probably dump an image which has all of your libraries pre-loaded.
Edit: additional answers:
5) Yes
6) Once you have dumped the image, you will still need to distribute the lisp binary to load the memory image. To make this transparent to the user, you will also have to have a loader script (or binary) to run the lisp binary with the image.
You don't have to start the lisp from scratch and load everything over again each time you want to run a simple test. For more efficient development, interactively evaluate code in the listener (REPL) of a running lisp environment.
For distribution, I use Zachary Beane's Buildapp tool. Very easy to install and use.
Regarding distribution -
I wrote a routine (it's at home and unavailable at the moment) that will write out the current image as a standard executable and quit. It works for both CLISP and SBCL.
I can rummage it up if you like.
In Unix shell programming the pipe operator is an extremely powerful tool. With a small set of core utilities, a systems language (like C) and a scripting language (like Python) you can construct extremely compact and powerful shell scripts, that are automatically parallelized by the operating system.
Obviously this is a very powerful programming paradigm, but I haven't seen pipes as first class abstractions in any language other than a shell script. The code needed to replicate the functionality of scripts using pipes seems to always be quite complex.
So my question is why don't I see something similar to Unix pipes in modern high-level languages like C#, Java, etc.? Are there languages (other than shell scripts) which do support first class pipes? Isn't it a convenient and safe way to express concurrent algorithms?
Just in case someone brings it up, I looked at the F# pipe-forward operator (forward pipe operator), and it looks more like a function application operator. It applies a function to data, rather than connecting two streams together, as far as I can tell, but I am open to corrections.
Postscript: While doing some research on implementing coroutines, I realize that there are certain parallels. In a blog post Martin Wolf describes a similar problem to mine but in terms of coroutines instead of pipes.
Haha! Thanks to my Google-fu, I have found an SO answer that may interest you. Basically, the answer is going against the "don't overload operators unless you really have to" argument by overloading the bitwise-OR operator to provide shell-like piping, resulting in Python code like this:
for i in xrange(2,100) | sieve(2) | sieve(3) | sieve(5) | sieve(7):
print i
What it does, conceptually, is pipe the list of numbers from 2 to 99 (xrange(2, 100)) through a sieve function that removes multiples of a given number (first 2, then 3, then 5, then 7). This is the start of a prime-number generator, though generating prime numbers this way is a rather bad idea. But we can do more:
for i in xrange(2,100) | strify() | startswith(5):
print i
This generates the range, then converts all of them from numbers to strings, and then filters out anything that doesn't start with 5.
The post shows a basic parent class that allows you to overload two methods, map and filter, to describe the behavior of your pipe. So strify() uses the map method to convert everything to a string, while sieve() uses the filter method to weed out things that aren't multiples of the number.
It's quite clever, though perhaps that means it's not very Pythonic, but it demonstrates what you are after and a technique to get it that can probably be applied easily to other languages.
You can do pipelining type parallelism quite easily in Erlang. Below is a shameless copy/paste from my blogpost of Jan 2008.
Also, Glasgow Parallel Haskell allows for parallel function composition, which amounts to the same thing, giving you implicit parallelisation.
You already think in terms of
pipelines - how about "gzcat
foo.tar.gz | tar xf -"? You may not
have known it, but the shell is
running the unzip and untar in
parallel - the stdin read in tar just
blocks until data is sent to stdout by
gzcat.
Well a lot of tasks can be expressed
in terms of pipelines, and if you can
do that then getting some level of
parallelisation is simple with David
King's helper code (even across erlang
nodes, ie. machines):
pipeline:run([pipeline:generator(BigList),
{filter,fun some_filter/1},
{map,fun_some_map/1},
{generic,fun some_complex_function/2},
fun some_more_complicated_function/1,
fun pipeline:collect/1]).
So basically what he's doing here is
making a list of the steps - each step
being implemented in a fun that
accepts as input whatever the previous
step outputs (the funs can even be
defined inline of course). Go check
out David's blog entry for the
code and more detailed explanation.
magrittr package provides something similar to F#'s pipe-forward operator in R:
rnorm(100) %>% abs %>% mean
Combined with dplyr package, it brings a neat data manipulation tool:
iris %>%
filter(Species == "virginica") %>%
select(-Species) %>%
colMeans
You can find something like pipes in C# and Java, for example, where you take a connection stream and put it inside the constructor of another connection stream.
So, you have in Java:
new BufferedReader(new InputStreamReader(System.in));
You may want to look up chaining input streams or output streams.
Thanks to all of the great answers and comments, here is a summary of what I learned:
It turns out that there is an entire paradigm related to what I am interested in called Flow-based programming. A good example of a language designed specially for flow-based programming is Hartmann pipelines. Hartamnn pipelines generalize the idea of streams and pipes used in Unix and other OS's, to allows for multiple input and output streams (rather than just a single input stream, and two output streams). Erlang contains powerful abstractions that make it easy to express concurrent processes in a manner which resembles pipes. Java provides PipedInputStream and PipedOutputStream which can be used with threads to achieve the same kind of abstractions in a more verbose manner.
I think the most fundamental reason is because C# and Java tend to be used to build more monolithic systems. Culturally, it's just not common to even want to do pipe-like things -- you just make your application implement the necessary functionality. The notion of building a multitude of simple tools and then gluing them together in arbitrary ways just isn't common in those contexts.
If you look at some of the scripting languages, like Python and Ruby, there are some pretty good tools for doing pipe-like things from within those scripts. Check out the Python subprocess module, for example, which allows you to do things like:
proc = subprocess.Popen('cat -',
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,)
stdout_value = proc.communicate('through stdin to stdout')[0]
print '\tpass through:', stdout_value
Are you looking at the F# |> operator? I think you actually want the >> operator.
Usually you just don't need it and programs run faster without it.
Basically piping is consumer/producer pattern. And it's not that hard to write those consumers and producers because they don't share much data.
Piping for Python : pypes
Mozart-OZ can do pipes using ports and threads.
Objective-C has the NSPipe class. I use it quite frequently.
I've had a lot of fun building pipeline functions in Python. I have a library I wrote, I put the contents and a sample run here. The best fit me for has been XML processing, described in this Wikipedia article.
You can do pipe like operations in Java by chaining/filtering/transforming iterators.
You can use Google's Guava Iterators.
I will say even with the very helpful guava library and static imports its still ends up being lots of Java code.
In Scala its quite easy to make your own pipe operator.
Streaming libraries based on coroutines have existed in Haskell for quite some time now. Two popular examples are conduit and pipes.
Both libraries are well-written and well-documented, and are relatively mature. The Yesod web framework is based on conduit, and it's pretty damn fast. Yesod is competitive with Node on performance, even beating it in a few places.
Interestingly, all of the these libraries are single-threaded by default. This is because the single motivating use case for pipelines is servers, which are I/O bound.
Since R added pipe operator today, it's worth to mention Julialang has pipe all a long:
help?> |>
search: |>
|>(x, f)
Applies a function to the preceding argument. This allows for easy function chaining.
Examples
≡≡≡≡≡≡≡≡≡≡
julia> [1:5;] |> x->x.^2 |> sum |> inv
0.01818181818181818
if you're still interested in an answer...
you can look at factor, or the older joy and forth for the concatenative paradigm.
in arguments and out arguments are implicit, dumped to a stack. then the next word (function) takes that data and does something with it.
the syntax is postfix.
"123" print
where print takes one argument, whatever is in the stack.
You can use my library in python: github.com/sspipe/sspipe
In Mathematica, you can use //
for example
f[g[h[x,parm1],parm2]]
quite a mess.
could be written as
x // h[#, parm1]& // g[#, parm2]& // f
the # and & is lambda in Mathematica
In js, there seems to be pipe operator |> soon.
https://github.com/tc39/proposal-pipeline-operator