I am trying to integrate a vector valued (49 components) function f[] which as an example may look like:
f[0] = 1
f[1] = cos(x)
f[2] = cos(2x)
f[3] = cos(3x)
... and so on.
I was wondering if there was a way to integrate such a vector function in GSL using a single command. I can currently do this only by having n=49 different cquad integration handles/procedures which seems inefficient, as I wish to use the same integration "mesh " for all the function components.
Thank you for your attention.
As far as I know, at the moment such things via GSL cannot be done.
But your task can be solved with the help of openmp (with #pragma omp parallel for), but there may be a very large overhead when using gsl dll by several threads at the same time.
It is possible (this is not accurate) that for that you will have to rebuild the GSL library itself with the openmp-compiler-flags. But GSL is quite thread-safe on its own.
Related
When the GPU is shared with other processes (e.g. Xorg or other CUDA procs), a CUDA process better should not consume all remaining memory but dynamically grow its usage instead.
(There are various errors you might get indirectly from this, like Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR. But this question is not about that.)
(In TensorFlow, you would use allow_growth=True in the GPU options to accomplish this. But this question is not about that.)
Is there a simple way to check if the GPU is currently used by other processes? (I'm not asking whether it is configured to be used for exclusive access.)
I could parse the output nvidia-smi and look for other processes. But that seems somewhat hacky and maybe not so reliable, and not simple enough.
(My software is using TensorFlow, so if TensorFlow provides such a function, nice. But if not, I don't care if this would be a C API or Python function. I would prefer to avoid other external dependencies though, except those I'm anyway using, like CUDA itself, or TensorFlow. I'm not afraid to use ctypes. So consider this question language invariant.)
There is nvmlDeviceGetComputeRunningProcesses and nvmlDeviceGetGraphicsRunningProcesses. (Documentation.)
This is a C API, but I could use pynvml if I don't care about the extra dependency.
Example usage (via).
So, I'm wondering how to implement a STFT in Julia, possibly using a Hamming window. I can't find anything on the internet.
What is the best way to do it? I'd prefer not to use Python libraries, but pure native Julia if possible. Maybe it's a feature still being developed in Juila...?
Thanks!
I'm not aware of any native STFT implementation in Julia. As David stated in his comment, you will have to implement this yourself. This is fairly straightforward:
1) Break up your signal into short time sections
2) Apply your window of choice (in your case, Hamming)
3) Take the FFT using Julia's fft function.
All of the above are fairly standard mathematical operations and you will find a lot of references online. The below code generates a Hamming window if you don't have one already (watch out for Julia's 1-indexing when using online references as a lot of the signal processing reference material likes to use 0-indexing when describing window generation).
Wb = Array(Float64, N)
## Hamming Window Generation
for n in 1:N
Wb[n] = 0.54-0.46cos(2pi*(n-1)/N)
end
You could also use the Hamming window function from DSP.jl.
P.S If you are running Julia on OS X, check out the Julia interface to Apple's Accelerate framework. This provides a very fast Hamming window implementation, as well as convolution and elementwise multiplication functions that might be helpful.
I have a simple plan for my program: do all input parsing and data preprocessing with Python and then pass a lot of structures and call ONCE number-crunching function that I'm going to write in C.
Here goes my question: what would be faster (in terms of execution speed)? To call C function with ctypes or write it using cython?
Given an infinite amount of time to optimize your code Ctypes will probably be faster as you can push as much of the heavy lifting into compiled code as possible.
cython takes python-like code and converts it to C code with lots of caveats and exceptions. If cython were perfect, it would be as fast as coding in the C-API for python. cython is a fantastic tool for quick code dev, but is not quite that perfect yet.
Ctypes will require some manipulation of your input variables to get them into C cleanly. This may not be a problem if you are doing a fair bit of data preprocessing anyway.
As usual you need to decide how important the execution speed is compared with coding time/code readability/code reusability
it would seem strange to me to be only interested in execution speed, and still want to code any of it in python.
Conversely, the general paradigms of python coding would tend to contrast with having large sections of C code held together by python wrappers
Hey there,
is there a way to compile (or better to say: 'translate') a matlab m-function into a C-function so that I can use it in the CUDA kernel of my mex file?
thanks a lot!
MATLAB Coder will generate C code for mex files. I do not yet have a copy to evaluate, so I can't speak with any authority about the quality and nature of the generated code.
However, if I had to guess, I'd say the generated code would likely require a lot of massaging to get it working on your GPU. You may have better luck with a product like Jacket, depending on what you're doing.
You can call a matlab (m or mex) function from C / fortran using this function call. You could then interface that with along with your CUDA kernel.
However it may not be the most efficient way to do things. You could write your own C code for the m file that you have or look it up on matlab central if any one else has done it.
The C function will call eventually set-up device variables and call a CUDA kernel?
I originally wanted to try this for a project because I thought this method would be easier than converting all of my MATLAB code to C first, but I ended up doing that anyway.
There are some user created MATLAB scripts to help provide this functionality, but since they aren't from the Mathworks you'll have to use them at your own risk. I tried them and never found anything malicious, but you never know. I couldn't get them to work with my project due to its specific complications but it should work for simpler tasks.
1) NvMEX: This is directly from Nvidia. http://developer.stage.nvidia.com/matlab-cuda
http://www.mathworks.com/discovery/matlab-gpu.html
2) CUDA MEX: This is from a user. http://www.mathworks.com/matlabcentral/fileexchange/25314-cuda-mex
This is not really a direct answer to your question, but if your goal is simply to have your MATLAB code run on the GPU, then you may find that if you have access to Parallel Computing Toolbox, you can use GPUArrays with arrayfun. For example, if the function you wish to evaluate across many points looks like this:
function y = myFcn( x )
y = 1;
for ii = 1:10
y = sin(x * y);
end
Then you could call this on the GPU like so:
gx = gpuArray( rand(1000) );
gy = arrayfun( #myFcn, gx );
Anyone know any good math functions that causes a lot of load on the CPU. I am wanting to create a simple program the just creates load for X amount of seconds while another program monitors it. I'm just looking for functions, not actual stress testing programs.
The Computer Language Benchmark Game has quite a few benchmarks, many of which are math-based. It's a good source because the source code for each algorithm is included and there are implementations of each benchmark in dozens of languages. That way, you can just use the implementation in whatever language you're comfortable compiling and running.
Try the Lucas-Lehmer primality test. It's what's used in the Prime95 executable, and Prime95's fairly standard for CPU stress testing.
A naive implementation of Fibonacci? Something like:
let fib = Seq.unfold(fun (p, c) -> Some((p, c), (c, p+c))) (1,1)