Asking for the installation of Caffe - caffe

I was installing the Caffe library on Mac OS, but when I type 'make run test', I encountered the following problem. What should I do? Thanks in advance. My macbook doesn't contain Cudas, does this affect the installation?
.build_release/test/test_all.testbin 0 --gtest_shuffle
Cuda number of devices: 32767
Setting to use device 0
Current device id: 0
Current device name:
Note: Randomizing tests' orders with a seed of 14037 .
[==========] Running 1927 tests from 259 test cases.
[----------] Global test environment set-up.
[----------] 4 tests from BlobSimpleTest/0, where TypeParam = f
[ RUN ] BlobSimpleTest/0.TestPointersCPUGPU
E0306 11:45:15.035683 2126779136 common.cpp:104] Cannot create Cublas handle. Cublas won't be available.
E0306 11:45:15.114891 2126779136 common.cpp:111] Cannot create Curand generator. Curand won't be available.
F0306 11:45:15.115012 2126779136 syncedmem.cpp:55] Check failed: error == cudaSuccess (35 vs. 0) CUDA driver version is insufficient for CUDA runtime version
*** Check failure stack trace: ***
# 0x10d2c976a google::LogMessage::Fail()
# 0x10d2c8f14 google::LogMessage::SendToLog()
# 0x10d2c93c7 google::LogMessage::Flush()
# 0x10d2cc679 google::LogMessageFatal::~LogMessageFatal()
# 0x10d2c9a4f google::LogMessageFatal::~LogMessageFatal()
# 0x10e023406 caffe::SyncedMemory::to_gpu()
# 0x10e022c5e caffe::SyncedMemory::gpu_data()
# 0x108021d9c caffe::BlobSimpleTest_TestPointersCPUGPU_Test<>::TestBody()
# 0x10849ba5c testing::internal::HandleExceptionsInMethodIfSupported<>()
# 0x10848a1ba testing::Test::Run()
# 0x10848b0e2 testing::TestInfo::Run()
# 0x10848b7d0 testing::TestCase::Run()
# 0x108491f86 testing::internal::UnitTestImpl::RunAllTests()
# 0x10849c264 testing::internal::HandleExceptionsInMethodIfSupported<>()
# 0x108491c99 testing::UnitTest::Run()
# 0x107f8c89a main
# 0x7fff903e15c9 start
# 0x3 (unknown)
make: *** [runtest] Abort trap: 6

I had the same issue. But i have a Graphic card specifically to run Caffe on it, so CPU_ONLY was not an option ;-)
To check if it's the same cause that mine, try to run CUDA Samples deviceQuery example
I fixed using CUDA Guide runfile verifications
sudo chmod 0666 /dev/nvidia*

Finally, I find a solution by setting CPU_ONLY := 1 in Makefile.config (uncomment the original one by removing the '#' in line "CPU_ONLY := 1 in Makefile.config") and rerun the command"make clean", "make all", then "make test", then "make runtest" referring to this link - https://github.com/BVLC/caffe/issues/736

Related

why does installing imblearn with pip is failing?

I am trying to install the python package "imblearn" to balanace datasets,
with the command pip install imblearn.
but it keeps failing.
trying from cmdand from PowerShell with admin privileges,
with regular pip command, and with git clone to the repo and then pip install.
everything is failing.
the error is:
C:\Users\ronke>pip install imblearn
Collecting imblearn
Using cached imblearn-0.0-py2.py3-none-any.whl (1.9 kB)
Collecting imbalanced-learn
Using cached imbalanced_learn-0.10.1-py3-none-any.whl (226 kB)
Requirement already satisfied: scipy>=1.3.2 in c:\networks\python3.8\lib\site-packages (from imbalanced-learn->imblearn) (1.6.3)
Collecting scikit-learn>=1.0.2
Using cached scikit-learn-1.2.0.tar.gz (7.2 MB)
Installing build dependencies ... error
error: subprocess-exited-with-error
× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> [73 lines of output]
Ignoring numpy: markers 'python_version == "3.10" and platform_system == "Windows" and platform_python_implementation != "PyPy"' don't match your environment
Collecting setuptools<60.0
Using cached setuptools-59.8.0-py3-none-any.whl (952 kB)
Collecting wheel
Using cached wheel-0.38.4-py3-none-any.whl (36 kB)
Collecting Cython>=0.29.24
Using cached Cython-0.29.32-py2.py3-none-any.whl (986 kB)
Collecting oldest-supported-numpy
Using cached oldest_supported_numpy-2022.11.19-py3-none-any.whl (4.9 kB)
Collecting scipy>=1.3.2
Using cached scipy-1.9.3.tar.gz (42.1 MB)
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Installing backend dependencies: started
Installing backend dependencies: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'error'
error: subprocess-exited-with-error
Preparing metadata (pyproject.toml) did not run successfully.
exit code: 1
[37 lines of output]
+ meson setup --prefix=c:\networks\python3.8 C:\Users\ronke\AppData\Local\Temp\pip-install-n9p_hxtm\scipy_cef72cd617894d719469ea6e03d892cb C:\Users\ronke\AppData\Local\Temp\pip-install-n9p_hxtm\scipy_cef72cd617894d719469ea6e03d892cb\.mesonpy-jv80z8m8\build --native-file=C:\Users\ronke\AppData\Local\Temp\pip-install-n9p_hxtm\scipy_cef72cd617894d719469ea6e03d892cb\.mesonpy-native-file.ini -Ddebug=false -Doptimization=2
The Meson build system
Version: 1.0.0
Source dir: C:\Users\ronke\AppData\Local\Temp\pip-install-n9p_hxtm\scipy_cef72cd617894d719469ea6e03d892cb
Build dir: C:\Users\ronke\AppData\Local\Temp\pip-install-n9p_hxtm\scipy_cef72cd617894d719469ea6e03d892cb\.mesonpy-jv80z8m8\build
Build type: native build
Project name: SciPy
Project version: 1.9.3
Activating VS 17.1.3
C compiler for the host machine: cl (msvc 19.31.31105 "Microsoft (R) C/C++ Optimizing Compiler Version 19.31.31105 for x64")
C linker for the host machine: link link 14.31.31105.0
C++ compiler for the host machine: cl (msvc 19.31.31105 "Microsoft (R) C/C++ Optimizing Compiler Version 19.31.31105 for x64")
C++ linker for the host machine: link link 14.31.31105.0
Host machine cpu family: x86_64
Host machine cpu: x86_64
Compiler for C supports arguments -Wno-unused-but-set-variable: NO
Compiler for C supports arguments -Wno-unused-but-set-variable: NO (cached)
Compiler for C supports arguments -Wno-unused-function: NO
Compiler for C supports arguments -Wno-conversion: NO
Compiler for C supports arguments -Wno-misleading-indentation: NO
Compiler for C supports arguments -Wno-incompatible-pointer-types: NO
Library m found: NO
..\..\meson.build:57:0: ERROR: Unknown compiler(s): [['ifort'], ['gfortran'], ['flang'], ['pgfortran'], ['g95']]
The following exception(s) were encountered:
Running `ifort --version` gave "[WinError 2] The system cannot find the file specified"
Running `ifort -V` gave "[WinError 2] The system cannot find the file specified"
Running `gfortran --version` gave "[WinError 2] The system cannot find the file specified"
Running `gfortran -V` gave "[WinError 2] The system cannot find the file specified"
Running `flang --version` gave "[WinError 2] The system cannot find the file specified"
Running `flang -V` gave "[WinError 2] The system cannot find the file specified"
Running `pgfortran --version` gave "[WinError 2] The system cannot find the file specified"
Running `pgfortran -V` gave "[WinError 2] The system cannot find the file specified"
Running `g95 --version` gave "[WinError 2] The system cannot find the file specified"
Running `g95 -V` gave "[WinError 2] The system cannot find the file specified"
A full log can be found at C:\Users\ronke\AppData\Local\Temp\pip-install-n9p_hxtm\scipy_cef72cd617894d719469ea6e03d892cb\.mesonpy-jv80z8m8\build\meson-logs\meson-log.txt
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
Encountered error while generating package metadata.
See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
Does someone know how to fix it?
thanks
Ron
I've condensed this to show the key part of the error:
Collecting scipy>=1.3.2
Using cached scipy-1.9.3.tar.gz (42.1 MB)
...
..\..\meson.build:57:0: ERROR: Unknown compiler(s): [['ifort'], ['gfortran'], ['flang'], ['pgfortran'], ['g95']]
Which is saying that scipy==1.9.3 failed to compile because a Fortran compiler was not availalbe.
This is probably caused by a 32-bit version of Python / Windows, since pre-compiled scipy and scikit-learn wheels are being phased out for 32-bit systems, and we're no longer testing 32-bit Windows in imblearn (see PR#936).
One possible fix is to install a copy of Python with Anaconda or Miniconda, then:
conda install -c conda-forge imbalanced-learn

NS 2.35 segmentation fault (core dumped) when running MDART protocol

When I run MDART routing protocol tcl script in NS 2.35, it says:
When configured, ns found the right version of tclsh in /usr/bin/tclsh8.6
but it doesn't seem to be there anymore, so ns will fall back on running the first tclsh in your path. The wrong version of tclsh may break the test suites. Reconfigure and rebuild ns if this is a problem.
num_nodes is set 16
INITIALIZE THE LIST xListHead
channel.cc:sendUp - Calc highestAntennaZ_ and distCST_
highestAntennaZ_ = 1.5, distCST_ = 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.0
SORTING LISTS ...DONE!
Segmentation fault (core dumped)
And the simulation end time supposely end at 205s but when run the animation, the simulation end at 8s. Why is that? Thanks
ns found the right version of tclsh in /usr/bin/tclsh8.6
but it doesn't seem to be there anymore
tcl8.6 : You are supposed to use the "ns-2.35 tcl8.5.10" : It doesn't change version or location. (Unless you move ns-allinone-2.35). The external tcl8.6 can change with e.g. an update. And later versions tend to be missing some files, e.g. in Debian / Ubuntu.
Example https://drive.google.com/file/d/0B7S255p3kFXNVVlxR0ZNRGVORjQ/view?usp=sharing
$ tar xvf ns-allinone-2.35_gcc5.tar.gz ## 2014 - 2017 update
$ cd ns-allinone-2.35/
$ export CC=gcc-4.8 CXX=g++-4.8 && ./install
Segmentation fault
MDART cannot be used with a contemporary OS. The latest that worked was an Ubuntu 18.04.4 updated 16 months ago. Please see my tests https://drive.google.com/drive/folders/1si2jA3lc-23lubVHb3tFbIAXfnhRfg5O?usp=sharing ..... CentOS 8 fails, Ubuntu 20.04 fails. Etc. "2021 OS" fails.
EDIT : Further tests revealed that an updated Ubuntu 18.04 failed : The latest Ubuntu version for MDART is 16.04 .
NOTE 1: The Ubuntu 16.04 nam package is corrupt. Please use https://drive.google.com/file/d/0B7S255p3kFXNdmxzSmRzaVRWb28/view?usp=sharing → nam_1.15-10-ubuntu14_amd64.deb
NOTE 2: The Ubuntu 16.04 ns command : sudo apt install ns2
NOTE 3: Building ns-allinone-2.35/ → Four cases of random Tk errors after the latest Ubuntu updates. Possible solutions: Use ns-allinone-2.35_2021.tar.xz https://drive.google.com/file/d/167cP7hPnJGiNL3rK4Mxnh_-0t7_S8FTL/view?usp=sharing with Tcl, Tk updated to version 8.5.17 .... And there are three options for extra gcc/g++ compilers to try out https://drive.google.com/drive/folders/1xVEATaYAwqvseBzYxKDzJoZ4-Hc_XOJm?usp=sharing
export CC=gcc447 CXX=g++447 && ./install ## can also be used with ns-allinone-2.35 version 2011
export CC=gcc48 CXX=g++48 && ./install
export CC=gcc54 CXX=g++54 && ./install
Simulation time : The setting is maximum time. Example : The setting set val(end) 1006.0 will run about 6 seconds and end the output text with : 1000 simulation seconds ....... Time is relative. ns2 was developed in the 90th when processors were very slow Pentium 1 / Pentium 2 . ... And different protocols behave different with simulation time.

Segfault triggered by multiple GPUs

I am running a training script with caffe on a 8 GPU (1080Ti) server.
If I train on 6 or fewer gpus (using CUDA_VISIBLE_DEVICES), everything is fine.
(I set export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 and specify these GPUs in training script.)
But if I train on 7 or 8 GPUs, I see this error at the start of training consistently:
Error: (unix time) try if you are using GNU date
SIGSEGV (#0x70) received by PID 17206 (TID 0x7fc678ffd700) from PID 112; stack trace:
# 0x7fc86186b4b0 (unknown)
# 0x7fc861983f75 (unknown)
# 0x7fc863c4b4c7 std::__cxx11::basic_string<>::_M_construct<>()
# 0x7fc863c4c60b _ZN5caffe2db10LMDBCursor5valueB5cxx11Ev
# 0x7fc863ace3e7 caffe::AnnotatedDataLayer<>::DataLayerSetUp()
# 0x7fc863a6e4d5 caffe::BasePrefetchingDataLayer<>::LayerSetUp()
# 0x7fc863cbf2b4 caffe::Net<>::Init()
# 0x7fc863cc11ae caffe::Net<>::Net()
# 0x7fc863bb9c9a caffe::Solver<>::InitTestNets()
# 0x7fc863bbb84d caffe::Solver<>::Init()
# 0x7fc863bbbb3f caffe::Solver<>::Solver()
# 0x7fc863ba7d61 caffe::Creator_SGDSolver<>()
# 0x7fc863ccc1c2 caffe::Worker<>::InternalThreadEntry()
# 0x7fc863cf94c5 caffe::InternalThread::entry()
# 0x7fc863cfa38e boost::detail::thread_data<>::run()
# 0x7fc85350d5d5 (unknown)
# 0x7fc83fee56ba start_thread
# 0x7fc86193d41d clone
# 0x0 (unknown)```
The Error: (unix time) ... at the start of the trace is apparently thrown by glog.
It appears to be thrown when a general failure happens.
This thread show many different issues triggering Error: (unix time)... and similar trace.
In the thread, it is noted that multiple GPUs may trigger this error.
That is what appears to be the root cause in my case.
Are there things I can further look into to understand what is happening?

Compile error during Caffe installation on OS X 10.11

I've configured Caffe environment on my Mac for several times. But this time I encountered a problem I've never met before:
I use Intel's MKL for accelerating computation instead of ATLAS, and I use Anaconda 2.7 and OpenCV 2.4, with Xcode 7.3.1 on OS X 10.11.6.
when I
make all -j8
in terminal under Caffe's root directory, the error info is:
AR -o .build_release/lib/libcaffe.a
LD -o .build_release/lib/libcaffe.so.1.0.0-rc5
clang: warning: argument unused during compilation: '-pthread'
ld: can't map file, errno=22 file '/usr/local/cuda/lib' for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [.build_release/lib/libcaffe.so.1.0.0-rc5] Error 1
make: *** Waiting for unfinished jobs....
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: .build_release/lib/libcaffe.a(parallel.o) has no symbols
I've tried many times, does anyone can help me out?
This looks like you haven't changed Makefile.config from GPU to CPU mode. There shouldn't be anything trying to actively link that library. I think the only CUDA one you should need is libicudata.so
Look for the lines
# CPU-only switch (uncomment to build without GPU support).
# CPU_ONLY := 1
and remove the octothorpe from the front of the second line.

What to do with 'Bus error' in caffe while training?

I am using NVIDIA Jetson TX1 and caffe to train the AlexNet on my own data.
I have 104,000 train and 20,000 validation images fed to my model. with batch size of 16 for both test and train.
I run the solver for training and I get this Bus error after 1300 iterations:
.
.
.
I0923 12:08:37.121116 2341 sgd_solver.cpp:106] Iteration 1300, Ir = 0.01
*** Aborted at 1474628919 (unix time) try "date -d #1474628919" if you are using GNU date ***
PC: # 0x0 (unknown)
*** SIGBUS (#0x7ddea45000) received by PID 2341 (TID 0x7faa9fdf70) from PID 18446744073149894656; stack trace: ***
# 0x7fb4b014e0 (unknown)
# 0x7fb3ebe8b0 (unknown)
# 0x7fb4057248 (unknown)
# 0x7fb40572b4 (unknown)
# 0x7fb446e120 caffe::db::LMDBCursor::value()
# 0x7fb4587624 caffe::DataReader::Body::read_one()
# 0x7fb4587a90 caffe::DataReader::Body::InternalThreadEntry()
# 0x7fb458a870 caffe::InternalThread::entry()
# 0x7fb458b0d4 boost::detail::thread_data<>::run()
# 0x7fafdf7ef0 (unknown)
# 0x7fafcfde48 start_thread
Bus error
I use ubuntu 14, NVIDIA TegraX1, RAM 3.8 GB.
As i understood it is a memory issue. Could you please explain better about it and help me how I can solve this problem?
If any other information is needed please let me know.