I am using Octave in Ubuntu ubuntu 11.10.
I am trying to find FFT of large datapoints in Octave
When i enter a vector with large number(around 1000 to 1500),Octave is hanging.
x=[1 2 3 4.....1500]
Did anyone face the same problem or is it generally slow.
Related
Please forgive what may be an insane question.
My setup: two machines, one with a GTX1080, one with an RX Vega 64. Retraining/fine tuning a bvlc_googlenet model on the GTX1080.
If I build Caffe for the Vega 64, then can I take a snapshot from the GTX1080 machine and restart training on the Vega 64? Would this work in the sense that the training would continue in a normal manner?
What if I moved the GTX1080 snapshot to a Volta V100 in AWS? Would this work?
I know Caffe will to some degree abstract the hardware, but I don't know how well it can do that. I need the GTX 1080 for something else...
Thanks in advance!
To my knowledge, this should work without a problem. Weight files and training snapshots are just blobs of numbers, that you should be able to resume on other hardware (e.g. CPU/GPU), a different machine with a different operating system, or between 32 and 64 bit processes.
Based on this github link https://github.com/dennybritz/cnn-text-classification-tf , I want to classified my datasets on Ubuntu-16.04 on GPU.
For running on GPU, I've been changed line 23 on text_cnn.py to this : with tf.device('/gpu:0'), tf.name_scope("embedding"):
my first dataset for train phase has 9000 documents and it's size is about 120M and
second one for train has 1300 documents and it's size is about 1M.
After running on my Titan X server with GPU, I have got errors.
Please guide me, How can I solve this issue?
Thanks.
You are getting Out of Memory error, so first thing to try is smaller batch size
(default is 64). I would start with:
./train.py --batch_size 32
Most of the memory is used to hold the embedding parameters and convolution parameters. I would suggest reduce:
EMBEDDING_DIM
NUM_FILTERS
BATCH_SIZE
try embedding_dim=16, batch_size=16 and num_filters=32, if that works, increase them 2x at a time.
Also if you are using docker virtual machine to run tensorflow, you might be limited to use only 1G of memory by default though you have 16G memory in your machine. see here for more details.
I was trying to run fcn on my data in caffe. I was able to convert my image sets into lmdb by convert_imageset builtin function caffe. However, once I wanted to train the net, it gave me the following error:
Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
.....
Aborted (core dumped)
I went through many online resources to solve the memory failure, but most of them suggesting reducing batch size. Even, I reduced the size of images to 256x256. I could not tackle this issue yet.
I checked the memory of GPU by this command nvidia-smi, and the model is Nvidia GT 730 and the memory is 1998 MiB. Since the batch size in train_val.prototxt is 1, I can not do anythin in train_val.prototxt. So my questions are:
By looking at log file in Terminal,I realized that whenever convert_imageset converting the data into LMDB, it is taking 1000 image in a group. Is it possible I change this number in line 143 and 151 of convert_imageset.cpp to a smaller (for example 2; to take two image at a time), recompile caffe, and then convert images to lmdb by using convert_imageset? Does it make sense?
If the answer to question 1 is yes, how can I compile caffe again,
should I remove build folder and again do caffe installation from
scratch?
How caffe process the LMDB data? Is it like taking a batch of those 1000 images showing while running convert_imagenet?
Your help is really appreciated.
Thanks...
AFAIK, there is no effect for the number of entries committed to lmdb at each transaction (txn->Commit();) on the cuda out of memory.
If you do want to re-compile caffe for whatever reason, simply run make clean. This will clear everything and let you re-compile from scratch.
Again, AFAIK, caffe access lmdb batch_size images at a time regardless of the size of the transaction size used when writing the dataset.
Are you sure batch_size is set to 1 for both TRAIN and TEST phases?
I'm preparing an acceptance test for a new machine with Nvidia graphics cards and I'd like a simple CUDA program that will fully exercise the GPU for a full day. The intent is to generate large amounts of heat and ensure the new machine is stable under the load. I'd like the code to be very easy to compile and run (no dependencies, no large input data sets), and also very easy to verify (small amounts of output). Also, I'd like it to be command-line only, no GUI (the test will have to be automated).
I was originally thinking of repeatedly running Vector Dot Products of large vectors. However, that's mostly memory-intensive. So if the GPUs are constantly waiting on memory accesses, then they probably aren't generating as much heat as they could.
I'm running on a CentOS Linux machine.
Does anyone have any suggestions?
You didn't mention which OS you are on.
Ideally, you would want to stress the floating point units, the logic/integer units, the GPU memory, the GPU voltage regulators (VRMs) and the main PSU. I don't think there is any single utility out there that does that.
Memory:
http://sourceforge.net/projects/cudagpumemtest/
Integer (?):
http://sourceforge.net/projects/cudalucas/
PSU and VRMs (In the past, this program could cause GPUs to run out-of-spec, breaking the card. I don't think that's the case anymore):
http://www.ozone3d.net/benchmarks/fur/
I am trying to read mxArray from matlab into my custom made .cu file.
I have two sparse matrices to operate on.
How do I read them inside cusp sparse matrices say A and B ( or in cuSPARSE matrices), so that I can perform operations and return them back to matlab.
One idea that I could come up with is to write mxArrays in .mtx file and then read
from it. But again, are there any alternatives?
Further, I am trying understand the various CUSP mechanisms using the examples posted on its website.But every I try to compile and run the examples, I am getting the following error.
terminate called after throwing an instance of
'thrust::system::detail::bad_alloc'
what(): N6thrust6system6detail9bad_allocE: CUDA driver version is
insufficient for CUDA runtime version
Abort
Here are the stuff that is installed on the machine that I am using.
CUDA v4.2
Thrust v1.6
Cusp v0.3
I am using GTX 480 with Linux x86_64 on my machine.
Strangely enough, code for device query is also returning this output.
CUDA Device Query...
There are 0 CUDA devices.
Press any key to exit...
I updated my drivers and SDK few days.
Not sure whats wrong.
I know, I am asking a lot in one questions but I am facing this problem from quite a while and upgrading and downgrading the drivers doesn't seem to solve.
Cheers
This error is most revealing, "CUDA driver version is insufficient for CUDA runtime version". You definitely need to update your driver.
I use CUSPARSE/CUSP through Jacket's Sparse Linear Algebra library. It's been good, but I wish there were more sparse features available in CUSPARSE/CUSP. I hear Jacket is going to get CULA Sparse into it soon, so that'll be nice.