I'm having some problem while installing Caffe. Please let me know if anyone have come across the same issue. Thanks.
make runtest
.build_release/test/test_all.testbin 0 --gtest_shuffle
Cuda number of devices: 1
Setting to use device 0
Current device id: 0
Note: Randomizing tests' orders with a seed of 88789 .
[==========] Running 838 tests from 169 test cases.
[----------] Global test environment set-up.
[----------] 3 tests from ImageDataLayerTest/3, where TypeParam = caffe::DoubleGPU
[ RUN ] ImageDataLayerTest/3.TestResize
F0107 14:26:04.664185 3079 math_functions.cpp:91] Check failed: error == cudaSuccess (11 vs. 0) invalid argument
* Check failure stack trace: *
# 0x2ab3f5243daa (unknown)
# 0x2ab3f5243ce4 (unknown)
# 0x2ab3f52436e6 (unknown)
# 0x2ab3f5246687 (unknown)
# 0x6bdc35 caffe::caffe_copy<>()
# 0x7439af caffe::BasePrefetchingDataLayer<>::Forward_gpu()
# 0x428da2 caffe::Layer<>::Forward()
# 0x62ff53 caffe::ImageDataLayerTest_TestResize_Test<>::TestBody()
# 0x657363 testing::internal::HandleExceptionsInMethodIfSupported<>()
# 0x64de07 testing::Test::Run()
# 0x64deae testing::TestInfo::Run()
# 0x64dfb5 testing::TestCase::Run()
# 0x6512f8 testing::internal::UnitTestImpl::RunAllTests()
# 0x651587 testing::UnitTest::Run()
# 0x41d3a0 main
# 0x2ab3f8396ec5 (unknown)
# 0x4243d7 (unknown)
# (nil) (unknown)
make: *** [runtest] Aborted (core dumped)
#
Ubuntu 14.04
/$ lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation G92GLM [Quadro FX 3800M] (rev a2)
/$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 340.29 Thu Jul 31 20:23:19 PDT 2014
GCC version: gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)
/$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2014 NVIDIA Corporation
Built on Thu_Jul_17_21:41:27_CDT_2014
Cuda compilation tools, release 6.5, V6.5.12
#
There seems to be a problem with the GPU support. Maybe it does not support your GPU. I would try installing Caffe without GPU support. All you need to do is to uncomment
CPU_ONLY := 1
in Makefile.config and then make again. Here are the instructions.
Related
I run into problems when trying to use caffe with multiple gpus. When executing following command, I get the error log show below:
caffe train -solver $SOLVER -gpu 0,1 2>&1 | tee $LOGGING
F0409 14:17:22.355074 12079 caffe.cpp:254] Multi-GPU execution not available - rebuild with USE_NCCL
*** Check failure stack trace: ***
# 0x2aee66002b2d google::LogMessage::Fail()
# 0x2aee66004995 google::LogMessage::SendToLog()
# 0x2aee660026a9 google::LogMessage::Flush()
# 0x2aee6600542e google::LogMessageFatal::~LogMessageFatal()
# 0x40c172 train()
# 0x4084f3 main
# 0x2aee78f67b35 __libc_start_main
# 0x408f0b (unknown)
Can anyone explain what is wrong here? Is there some caffe bug which I am not aware of?
Install CUDA
Install cuDNN
Install Dependencies
$ sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler libgflags-dev libgoogle-glog-dev liblmdb-dev libatlas-base-dev git
$ sudo apt-get install --no-install-recommends libboost-all-dev
Install NCCL
NVIDIA NCCL is required to run Caffe on more than one GPU. NCCL can be installed with the following commands:
$ git clone https://github.com/NVIDIA/nccl.git
$ cd nccl
$ sudo make install -j
NCCL libraries and headers will be installed in /usr/local/lib and /usr/local/include.
Install Caffe
Uncomment the line USE_CUDNN := 1. This enables cuDNN acceleration.
Uncomment the line USE_NCCL := 1. This enables NCCL which is required to run Caffe on multiple GPUs.
Save and close the file. You're now ready to compile Caffe.
$ make all -j
When this command completes, the Caffe binary will be available at build/tools/caffe.
When I go to /usr/local/cuda/samples/1_Utilities/deviceQuery and execute
moose#pc09 /usr/local/cuda/samples/1_Utilities/deviceQuery $ sudo make clean
rm -f deviceQuery deviceQuery.o
rm -rf ../../bin/x86_64/linux/release/deviceQuery
moose#pc09 /usr/local/cuda/samples/1_Utilities/deviceQuery $ sudo make
"/usr/local/cuda-7.0"/bin/nvcc -ccbin g++ -I../../common/inc -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -o deviceQuery.o -c deviceQuery.cpp
"/usr/local/cuda-7.0"/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -o deviceQuery deviceQuery.o
mkdir -p ../../bin/x86_64/linux/release
cp deviceQuery ../../bin/x86_64/linux/release
moose#pc09 /usr/local/cuda/samples/1_Utilities/deviceQuery $ ./deviceQuery
I keep getting
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version Result = FAIL
I have no idea how to fix it.
My System
moose#pc09 ~ $ cat /etc/issue
Linux Mint 17 Qiana \n \l
moose#pc09 ~ $ uname -a
Linux pc09 3.13.0-36-generic #63-Ubuntu SMP Wed Sep 3 21:30:07 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
moose#pc09 ~ $ lspci -v | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GK110B [GeForce GTX Titan Black] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation Device 1066
Kernel driver in use: nvidia
01:00.1 Audio device: NVIDIA Corporation GK110 HDMI Audio (rev a1)
Subsystem: NVIDIA Corporation Device 1066
moose#pc09 ~ $ sudo lshw -c video
*-display
description: VGA compatible controller
product: GK110B [GeForce GTX Titan Black]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci#0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
configuration: driver=nvidia latency=0
resources: irq:96 memory:fa000000-faffffff memory:d0000000-d7ffffff memory:d8000000-d9ffffff ioport:e000(size=128) memory:fb000000-fb07ffff
moose#pc09 ~ $ nvidia-settings -q NvidiaDriverVersion
Attribute 'NvidiaDriverVersion' (pc09:0.0): 331.79
moose#pc09 ~ $ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 331.79 Sun May 18 03:55:59 PDT 2014
GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)
moose#pc09 ~ $ lsmod | grep -i nvidia
nvidia_uvm 34855 0
nvidia 10703828 40 nvidia_uvm
drm 303102 5 ttm,drm_kms_helper,nvidia,nouveau
moose#pc09 ~ $ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Mon_Feb_16_22:59:02_CST_2015
Cuda compilation tools, release 7.0, V7.0.27
moose#pc09 ~ $ nvidia-smi
Thu Nov 12 11:23:24 2015
+------------------------------------------------------+
| NVIDIA-SMI 331.79 Driver Version: 331.79 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 0000:01:00.0 N/A | N/A |
| 26% 35C N/A N/A / N/A | 132MiB / 6143MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
Update your NVIDIA driver. At the moment you have the driver which only supports CUDA 6 or lower, and you are trying to use the CUDA 7.0 toolkit with it.
I ran into this exact same error message with toolkit 8.0 on ubuntu 1604. I tried reinstalling toolkit, cudnn, etc etc and it didn't help. The solution turned out to be very simple: update to the latest NVIDIA driver. I installed NVIDIA-Linux-x86_64-367.57.run and the error went away.
My cent,
this error may be related to the selected GPU mode (Performance/Power Saving Mode), when you select (with nvidia-settings utiliy) the integrated Intel GPU and you execute the deviceQuery script... you get this error:
-> CUDA driver version is insufficient for CUDA runtime version
But this error is misleading, by selecting back the NVIDIA GPU(Performance mode) with nvidia-settings utility the problem disappears.
It is not a version problem (in my scenario).
Regards
When I go to /usr/local/cuda/samples/1_Utilities/deviceQuery and execute
moose#pc09 /usr/local/cuda/samples/1_Utilities/deviceQuery $ sudo make clean
rm -f deviceQuery deviceQuery.o
rm -rf ../../bin/x86_64/linux/release/deviceQuery
moose#pc09 /usr/local/cuda/samples/1_Utilities/deviceQuery $ sudo make
"/usr/local/cuda-7.0"/bin/nvcc -ccbin g++ -I../../common/inc -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -o deviceQuery.o -c deviceQuery.cpp
"/usr/local/cuda-7.0"/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -o deviceQuery deviceQuery.o
mkdir -p ../../bin/x86_64/linux/release
cp deviceQuery ../../bin/x86_64/linux/release
moose#pc09 /usr/local/cuda/samples/1_Utilities/deviceQuery $ ./deviceQuery
I keep getting
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version Result = FAIL
I have no idea how to fix it.
My System
moose#pc09 ~ $ cat /etc/issue
Linux Mint 17 Qiana \n \l
moose#pc09 ~ $ uname -a
Linux pc09 3.13.0-36-generic #63-Ubuntu SMP Wed Sep 3 21:30:07 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
moose#pc09 ~ $ lspci -v | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GK110B [GeForce GTX Titan Black] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation Device 1066
Kernel driver in use: nvidia
01:00.1 Audio device: NVIDIA Corporation GK110 HDMI Audio (rev a1)
Subsystem: NVIDIA Corporation Device 1066
moose#pc09 ~ $ sudo lshw -c video
*-display
description: VGA compatible controller
product: GK110B [GeForce GTX Titan Black]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci#0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
configuration: driver=nvidia latency=0
resources: irq:96 memory:fa000000-faffffff memory:d0000000-d7ffffff memory:d8000000-d9ffffff ioport:e000(size=128) memory:fb000000-fb07ffff
moose#pc09 ~ $ nvidia-settings -q NvidiaDriverVersion
Attribute 'NvidiaDriverVersion' (pc09:0.0): 331.79
moose#pc09 ~ $ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 331.79 Sun May 18 03:55:59 PDT 2014
GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)
moose#pc09 ~ $ lsmod | grep -i nvidia
nvidia_uvm 34855 0
nvidia 10703828 40 nvidia_uvm
drm 303102 5 ttm,drm_kms_helper,nvidia,nouveau
moose#pc09 ~ $ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Mon_Feb_16_22:59:02_CST_2015
Cuda compilation tools, release 7.0, V7.0.27
moose#pc09 ~ $ nvidia-smi
Thu Nov 12 11:23:24 2015
+------------------------------------------------------+
| NVIDIA-SMI 331.79 Driver Version: 331.79 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 0000:01:00.0 N/A | N/A |
| 26% 35C N/A N/A / N/A | 132MiB / 6143MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
Update your NVIDIA driver. At the moment you have the driver which only supports CUDA 6 or lower, and you are trying to use the CUDA 7.0 toolkit with it.
I ran into this exact same error message with toolkit 8.0 on ubuntu 1604. I tried reinstalling toolkit, cudnn, etc etc and it didn't help. The solution turned out to be very simple: update to the latest NVIDIA driver. I installed NVIDIA-Linux-x86_64-367.57.run and the error went away.
My cent,
this error may be related to the selected GPU mode (Performance/Power Saving Mode), when you select (with nvidia-settings utiliy) the integrated Intel GPU and you execute the deviceQuery script... you get this error:
-> CUDA driver version is insufficient for CUDA runtime version
But this error is misleading, by selecting back the NVIDIA GPU(Performance mode) with nvidia-settings utility the problem disappears.
It is not a version problem (in my scenario).
Regards
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I'm trying to install the ATLAS library (v3.10.1) on my Tegra3 ARM CPU (cortex-a9) under an Ubuntu 11.04 OS but I cannot passed the configuration step..
When I launch the "./configure" executable I get this following output:
make: `xconfig' is up to date.
./xconfig -d s /home/ubuntu/Libraries/ATLAS/build/.././ -d b /home/ubuntu/Libraries/ATLAS/build -D c -DATL_ARM_HARDFP=1 -Ss ADdir /Libraries/ATLAS/build/ARMHARDFP -Si archdef 0 -Fa alg -mfloat-abi=hard
OS configured as Linux (1)
Assembly configured as GAS_ARM (7)
Vector ISA Extension configured as NEON (10,1024)
Architecture configured as ARMv7 (46)
Bad CPU MHZ value=0, res='CPU MHZ=0
'
Clock rate configured as 0Mhz
Maximum number of threads configured as 4
Parallel make command configured as '$(MAKE) -j 4'
Pointer width configured as 32
Cannot detect CPU throttling.
rm -f config1.out
make atlas_run atldir=/home/ubuntu/Libraries/ATLAS/build exe=xprobe_comp redir=config1.out \
args="-v 0 -o atlconf.txt -O 1 -A 46 -Si nof77 0 -V 1024 -Fa ic '-mfloat-abi=hard' -Fa sm '-mfloat-abi=hard' -Fa dm '-mfloat-abi=hard' -Fa sk '-mfloat-abi=hard' -Fa dk '-mfloat-abi=hard' -Fa xc '-mfloat-abi=hard' -Fa gc '-mfloat-abi=hard' -Fa if '-mfloat-abi=hard' -b 32 -d b /home/ubuntu/Libraries/ATLAS/build"
make[1]: Entering directory `/home/ubuntu/Libraries/ATLAS/build'
cd /home/ubuntu/Libraries/ATLAS/build ; ./xprobe_comp -v 0 -o atlconf.txt -O 1 -A 46 -Si nof77 0 -V 1024 -Fa ic '-mfloat-abi=hard' -Fa sm '-mfloat-abi=hard' -Fa dm '-mfloat-abi=hard' -Fa sk '-mfloat-abi=hard' -Fa dk '-mfloat-abi=hard' -Fa xc '-mfloat-abi=hard' -Fa gc '-mfloat-abi=hard' -Fa if '-mfloat-abi=hard' -b 32 -d b /home/ubuntu/Libraries/ATLAS/build > config1.out
sh: Syntax error: EOF in backquote substitution
sh: Syntax error: EOF in backquote substitution
sh: Syntax error: EOF in backquote substitution
/usr/bin/ld: error: /tmp/cck4AYUv.o uses VFP register arguments, xctest does not
/usr/bin/ld: failed to merge target specific data of file /tmp/cck4AYUv.o
collect2: ld returned 1 exit status
make[2]: *** [IRunCComp] Error 1
/usr/bin/ld: error: /tmp/ccuMjBW4.o uses VFP register arguments, xctest does not
/usr/bin/ld: failed to merge target specific data of file /tmp/ccuMjBW4.o
collect2: ld returned 1 exit status
make[2]: *** [IRunCComp] Error 1
Unable to find usable compiler for ICC; abortingMake sure compilers are in your path, and specify good compilers to configure
(see INSTALL.txt or 'configure --help' for details)make[1]: *** [atlas_run] Error 1
make[1]: Leaving directory `/home/ubuntu/Libraries/ATLAS/build'
make: *** [IRun_comp] Error 2
ERROR 512 IN SYSCMND: 'make IRun_comp args="-v 0 -o atlconf.txt -O 1 -A 46 -Si nof77 0 -V 1024 -Fa ic '-mfloat-abi=hard' -Fa sm '-mfloat-abi=hard' -Fa dm '-mfloat-abi=hard' -Fa sk '-mfloat-abi=hard' -Fa dk '-mfloat-abi=hard' -Fa xc '-mfloat-abi=hard' -Fa gc '-mfloat-abi=hard' -Fa if '-mfloat-abi=hard' -b 32"'
mkdir src bin tune interfaces
mkdir: cannot create directory `src': File exists
mkdir: cannot create directory `bin': File exists
mkdir: cannot create directory `tune': File exists
mkdir: cannot create directory `interfaces': File exists
make: *** [make_subdirs] Error 1
make -f Make.top startup
make[1]: Entering directory `/home/ubuntu/Libraries/ATLAS/build'
Make.top:1: Make.inc: No such file or directory
Make.top:325: warning: overriding commands for target `/AtlasTest'
Make.top:76: warning: ignoring old commands for target `/AtlasTest'
make[1]: *** No rule to make target `Make.inc'. Stop.
make[1]: Leaving directory `/home/ubuntu/Libraries/ATLAS/build'
make: *** [startup] Error 2
mv: cannot stat `lib/Makefile': No such file or directory
.././configure: 450: cannot create lib/Makefile: Directory nonexistent
.././configure: 451: cannot create lib/Makefile: Directory nonexistent
.././configure: 452: cannot create lib/Makefile: Directory nonexistent
.././configure: 453: cannot create lib/Makefile: Directory nonexistent
.././configure: 509: cannot create lib/Makefile: Directory nonexistent
DONE configure
So, I have three questions:
First: Why "Bad CPU MHZ value=0, res='CPU MHZ=0" ? I precise that the CPU throttling of all the cores is set on 0 (I checked the /sys/devices/system/cpu/cpu*/cpufreq/throttle files). Is there a way to pass the clock frequency of the ARM cpu as argument ?
Second: Why "sh: Syntax error: EOF in backquote substitution" ?
Third: "Unable to find usable compiler for ICC; abortingMake.." Is there a way to say at the ./configure executable to not looking for ICC ? Because I'm trying to build ATLAS on ARM therefore ICC is not available.
Thanks in advance for your help !
Guix
The ATLAS configuration environment is broken for ARM, and not very fault tolerant in general:
first it attempts to determine system performance by grepping in /proc/cpuinfo (which has never been intended for anything other than some human-readable information dump). You can override this by specifying a frequency on the configure command line: -m <MHz>
then it probes for whether power management is enabled - if it is, it bails out again. Can't see a configure option, but if you make ProbeThrottle() in CONFIG/src/backend/archinfo_linux.c return 0, it gets past that.
You're then hit by the fact that there seems to have actually been some work done for some Cortex-A8 platform several years ago, and the compiler flags set by default for ARMv7 in CONFIG/src/atlcomp.txt include -mfloat-abi=softfp. Change this to 'hard' and it will actually work on a modern ARM Linux distribution.
The syntax error is fallout from trying to look for compilers in /opt/bin and /opt/sbin and not handling errors.
With the above workarounds, I don't see any ICC errors, and the build gets a fair bit along before crashing and burning.
In short, there will be some porting effort required in order for it to work properly on ARM. Maybe you can start by sending an error report to their developer mailing list?
I have recently installed mysql5 and perl5 through macports in order to try and subvert an earlier issue of running perl script architecture discrepancies (introduced as of OSX10.6).
Downloaded the DBD::mysql package and seek to manually install it.
perl Makefile works well, as does make.
make test, however, yields the following:
PERL_DL_NONLAZY=1 /opt/local/bin/perl5 "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/00base....................ok 1/6
# Failed test 'use DBD::mysql;'
# at t/00base.t line 21.
# Tried to use 'DBD::mysql'.
# Error: Can't load '/Users/ianseyer/Downloads/DBD-mysql- 4.011/blib/arch/auto/DBD/mysql/mysql.bundle' for module DBD::mysql: dlopen(/Users/ianseyer/Downloads/DBD-mysql-4.011/blib/arch/auto/DBD/mysql/mysql.bundle, 2): Symbol not found: _is_prefix
# Referenced from: /Users/ianseyer/Downloads/DBD-mysql- 4.011/blib/arch/auto/DBD/mysql/mysql.bundle
# Expected in: dynamic lookup
# at (eval 7) line 2
# Compilation failed in require at (eval 7) line 2.
# BEGIN failed--compilation aborted at (eval 7) line 2.
t/00base....................NOK 2/6FAILED--Further testing stopped: Unable to load DBD::mysql
make: *** [test_dynamic] Error 255
Any ideas? Thanks.
I would start by trying to install the macports version of DBD::mysql:
sudo port install p5-dbd-mysql
If that doesn't work, try cpanm:
cpanm -S DBD::Mysql
Only use manual installation as a last resort.