Root access required for CUDA? - cuda

I am using GeForce 8400M GS on Ubuntu 10.04 and I am learning CUDA programming. I am writing and running few basic programs. I was using cudaMalloc, and it kept giving me an error until I ran the code as root. However, I had to run the code as root only once. After that, even if I run the code as normal user, I do not get an error on malloc. What's going on?

This is probably due to your GPU not being properly initialized at boot. I've come across this problem when using Ubuntu Server and other installations where an X server isn't being started automatically. Try the following to fix it:
Create a directory for a script to initialize your GPUs. I usually use /root/bin. In this directory, create a file called cudainit.sh with the following code in it (this script came from the Nvidia forums).
#!/bin/bash
/sbin/modprobe nvidia
if [ "$?" -eq 0 ]; then
# Count the number of NVIDIA controllers found.
N3D=`/usr/bin/lspci | grep -i NVIDIA | grep "3D controller" | wc -l`
NVGA=`/usr/bin/lspci | grep -i NVIDIA | grep "VGA compatible controller" | wc -l`
N=`expr $N3D + $NVGA - 1`
for i in `seq 0 $N`; do
mknod -m 666 /dev/nvidia$i c 195 $i;
done
mknod -m 666 /dev/nvidiactl c 195 255
else
exit 1
fi
Now we need to make this script run automatically at boot. Edit /etc/rc.local to look like the following.
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.
#
# Init CUDA for all users
#
/root/bin/cudainit.sh
exit 0
Reboot your computer and try to run your CUDA program as a regular user. If I'm right about what the problem is, then it should be fixed.

To work with Ubuntu 14.04 I followed https://devtalk.nvidia.com/default/topic/699610/linux/334-21-driver-returns-999-on-cuinit-cuda-/ to add nvidia-uvm to etc/modules, and to add a line to a custom udev rule. Create /etc/udev/rules.d/70-nvidia-uvm.rules with this line:
KERNEL=="nvidia_uvm", RUN+="/bin/bash -c '/bin/mknod -m 666 /dev/nvidia-uvm c $(grep nvidia-uvm /proc/devices | cut -d \ -f 1) 0;'"
I don't understand why sudo modprobe nvidia-uvm works to create a proper /dev/nvidia-uvm (as does sudo cuda_program) but the /etc/modules listing requires the udev rule.

Related

What is the right way to increase the hard and soft ulimits for a singularity-container image?

The task I want to complete: I need to run a python package inside of a singularity-container that is asking to open at least some 9704 files. This is the first I have heard of it and searching around this has something to do with a system’s ulimit.
What I currently have is the following def file.
I am setting the * hard nofile flag and the * soft nofile flag to 15 thousand. The sed line does edit the conf file but within the singularity shell my ulimit is still the default 1024.
Bootstrap: docker
From: fedora
%post
dnf -y update
dnf -y install nano pip wget libXcomposite libXcursor libXi libXtst libXrandr alsa-lib mesa-libEGL libXdamage mesa-libGL libXScrnSaver
wget -c https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh
/bin/bash Anaconda3-2020.02-Linux-x86_64.sh -bfp /usr/local
conda config --file /.condarc --add channels defaults
conda config --file /.condarc --add channels conda-forge
conda update conda
sed -i '2s/#/\n* hard nofile 15000\n* soft nofile 15000\n\n#/g' /etc/security/limits.conf
bash
%runscript
python /Users/lamsal/count_of_monte_cristo/orthofinder_run/OrthoFinder_source/orthofinder.py -f /Users/lamsal/count_of_monte_cristo/orthofinder_run/concatanated_FAs/
I am following the “official” instuctions to change the ulimits for a RHEL based system from IBM’s webpage here: https://www.ibm.com/docs/en/rational-clearcase/9.0.2?topic=servers-increasing-number-file-handles-linux-workstations
Is the sed line not the right way to change ulimits for a singularity image?
Short answer:
Change the value on the host OS.
Long answer:
In this instance, running a singularity container is best thought of as any other binary you're executing in your host OS. It creates its own separate environment, but otherwise it follows the rules and restrictions of the user running it. Here, the ulimit is taken from the host kernel and completely ignores any configs that may exist in the container itself.
Compare the output from the following:
# check the ulimit on the host
ulimit -n
# check the ulimit in the singularity container
singularity exec -e image.sif ulimit -n
# docker only cares about container config settings
docker run --rm fedora:latest ulimit -n
# change your local ulimit
ulimit -n 4096
# verify it has changed
ulimit -n
# singularity has changed
singularity exec -e image.sif ulimit -n
# ... but docker hasn't
docker run --rm fedora:latest ulimit -n
To have a persistent fix, you'll need to modify the setting on your host OS. Assuming you're on MacOS this answer should take care of that.
If you don't have root privs or you're only using this intermittently you can run ulimit by before running singularity. Alternatively, you could use a wrapper script to run the image and set it in there.

How to execute a script when Chrome is closed?

I want to make a timestamped backup of the bookmarks with rsync everytime Chrome exits. How to trigger a script execution right after Chrome closes?
Edit:
This is the default execution script to start Chrome on Linux Mint with the solution I'm trying to implement:
#!/bin/bash
#
# Copyright (c) 2011 The Chromium Authors. All rights reserved.
# Use of this source code is governed by a BSD-style license that can be
# found in the LICENSE file.
# Let the wrapped binary know that it has been run through the wrapper.
export CHROME_WRAPPER="`readlink -f "$0"`"
HERE="`dirname "$CHROME_WRAPPER"`"
"/opt/google/chrome/chrome" & pid=$!
wait $pid
if ! pgrep chrome > /dev/null; then
echo "It exited successfully"
fi
# Check if the CPU supports SSE2. If not, try to pop up a dialog to explain the
# problem and exit. Otherwise the browser will just crash with a SIGILL.
# http://crbug.com/348761
grep ^flags /proc/cpuinfo|grep -qs sse2
if [ $? != 0 ]; then
SSE2_DEPRECATION_MSG="This computer can no longer run Google Chrome because \
its hardware is no longer supported."
if which zenity &> /dev/null; then
zenity --warning --text="$SSE2_DEPRECATION_MSG"
elif which gmessage &> /dev/null; then
gmessage "$SSE2_DEPRECATION_MSG"
elif which xmessage &> /dev/null; then
xmessage "$SSE2_DEPRECATION_MSG"
else
echo "$SSE2_DEPRECATION_MSG" 1>&2
fi
exit 1
fi
# We include some xdg utilities next to the binary, and we want to prefer them
# over the system versions when we know the system versions are very old. We
# detect whether the system xdg utilities are sufficiently new to be likely to
# work for us by looking for xdg-settings. If we find it, we leave $PATH alone,
# so that the system xdg utilities (including any distro patches) will be used.
if ! which xdg-settings &> /dev/null; then
# Old xdg utilities. Prepend $HERE to $PATH to use ours instead.
export PATH="$HERE:$PATH"
else
# Use system xdg utilities. But first create mimeapps.list if it doesn't
# exist; some systems have bugs in xdg-mime that make it fail without it.
xdg_app_dir="${XDG_DATA_HOME:-$HOME/.local/share/applications}"
mkdir -p "$xdg_app_dir"
[ -f "$xdg_app_dir/mimeapps.list" ] || touch "$xdg_app_dir/mimeapps.list"
fi
# Always use our versions of ffmpeg libs.
# This also makes RPMs find the compatibly-named library symlinks.
if [[ -n "$LD_LIBRARY_PATH" ]]; then
LD_LIBRARY_PATH="$HERE:$HERE/lib:$LD_LIBRARY_PATH"
else
LD_LIBRARY_PATH="$HERE:$HERE/lib"
fi
export LD_LIBRARY_PATH
export CHROME_VERSION_EXTRA="stable"
# We don't want bug-buddy intercepting our crashes. http://crbug.com/24120
export GNOME_DISABLE_CRASH_DIALOG=SET_BY_GOOGLE_CHROME
# Automagically migrate user data directory.
# TODO(phajdan.jr): Remove along with migration code in the browser for M33.
if [[ -n "" ]]; then
if [[ ! -d "" ]]; then
"$HERE/chrome" "--migrate-data-dir-for-sxs=" \
--enable-logging=stderr --log-level=0
fi
fi
# Sanitize std{in,out,err} because they'll be shared with untrusted child
# processes (http://crbug.com/376567).
exec < /dev/null
exec > >(exec cat)
exec 2> >(exec cat >&2)
# Make sure that the profile directory specified in the environment, if any,
# overrides the default.
if [[ -n "$CHROME_USER_DATA_DIR" ]]; then
# Note: exec -a below is a bashism.
exec -a "$0" "$HERE/chrome" \
--user-data-dir="$CHROME_USER_DATA_DIR" "$#"
else
exec -a "$0" "$HERE/chrome" "$#"
fi
What OS are you using? In OS X, this shell script will start Chrome and then do stuff when Chrome quits - it should be easy to adapt it for your needs in any Unix-like OS.
#! /usr/bin/env sh
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" & pid=$!
wait $pid
if ! pgrep Chrome > /dev/null; then # no instances of Chrome are running
# do stuff
fi
(based on this answer)
Edit: And this works for me in Ubuntu:
#! /usr/bin/env sh
/opt/google/chrome/chrome & pid=$!
wait $pid
if ! pgrep chrome > /dev/null; then
# do stuff
fi
Due to the way Google Chrome uses processes, monitoring (PID) the process of a specific Google Chrome window (or instance) of it is usually a headache or (almost) impossible.
But, there is a workaround way that makes this possible. Just use the --user-data-dir parameter pointing to the /tmp folder (--user-data-dir=/tmp).
Below, to illustrate, I've created a bash script that starts the http Web service Gitea, opens it in a Google Chrome, and then terminates it when the Google Chrome window is closed.
#!/bin/bash
gitea & GITEA_PID=$!
sleep 3
google-chrome-stable --user-data-dir=/tmp --app=http://0.0.0.0:3000/ & CHROME_PID=$!
wait $CHROME_PID
kill -9 $GITEA_PID
Basically I'm using your same idea. Just adapt. 🥰
[Ref(s).: https://stackoverflow.com/a/75013043/3223785 ,
https://stackoverflow.com/a/35294908/3223785 ,
https://www.ghacks.net/2013/10/06/list-useful-google-chrome-command-line-switches/ ,
https://www.reddit.com/r/firefox/comments/w61dwi/how_do_i_start_firefox_in_a_single_window_with/?utm_source=share&utm_medium=web2x&context=3 , ]

how to reinitialize the database

I have downloaded a demo copy of Hybris for evaluation purposes, and it has been more than 30 days since I downloaded it, and recently I tried to restart it, but it would not, and instead gave me the following message:
"This licence is only for demo or develop usage and is valid for 30 days.
After this time you have to reinitialize database to continue your work."
I am/have been running it on a Mac, and the database is MySQL...
What (UNIX) commands do I use to re-initialise the database, so that I can start up the Hybris Server?
Using command line in the Terminal application - goto YOURPATH/hybris/bin/platform and run the ant clean all then ant initialize command then start hybris:
1) Goto your platform directory
cd $YOURPATH/hybris/bin/platform
2) Set ant's environment by runing "dot" "space" "dot-slash" setantenv.sh
. ./setantenv.sh
3) Then run ant clean all (to clean environment)
ant clean all
4) then run ant initialize (to re-initialize environment)
ant initialize
5) Re-start the hybris server process by running hybrisserver.sh
./hybrisserver.sh
6) have a nice rest of your day! (if this helped you then please give an UP vote - thanks!)
:)
you can use Ant command ant initialize and error will go away
Ant initialize would removes tables that exists in Hybris items.xml files? If you want to reset your DB i have a script that i use across various projects (can be found here, on GitHub)
#!/bin/bash
MUSER="$1"
MPASS="$2"
MDB="$3"
# Detect paths
MYSQL=$(which mysql)
AWK=$(which awk)
GREP=$(which grep)
if [ $# -ne 3 ]
then
echo "Usage: $0 {MySQL-User-Name} {MySQL-User-Password} {MySQL-Database-Name}"
echo "Drops all tables from a MySQL"
exit 1
fi
TABLES=$($MYSQL -u $MUSER -p$MPASS $MDB -e 'show tables' | $AWK '{ print $1}' | $GREP -v '^Tables' )
for t in $TABLES
do
echo "Deleting $t table from $MDB database..."
$MYSQL -u $MUSER -p$MPASS $MDB -e "drop table $t"
done
You need to reinitialize, [ant all] and rebuild hybris as you have did in first time:
Reason : Evaluation copy works only for 30 days and after it will be expired.
When you start your server it will show in console like below image. Pls Check.
Yo can also use Hybris Administration Console to initialization
Platfrom -> Initialization

How to download logs from child gears

I have OpenShift Enterprise 2.0 running in a multi-node setup. I am running a simple JBoss scaled app (3 gears, so HAProxy and 2 JBoss gears). I have used a pre_start_jbossews script in .openshift/action_hooks to configure verbose GC logging (with just gc.log as the file name). However, I can't figure out how to get the gc.log files from the gears running JBoss.
[Interestingly enough, there is an empty gc.log file in the head/parent gear (running HAProxy). Looks like there is a java process started there, that might be a bug.]
I tried to run
rhc scp <appname> download . jbossews/gc.log --gears
hoping that it would be implemented like the ssh --gears option, but it just tells me 'invalid option'. So my question is, how can I actually download logs from child gears?
I don't think that you can use RHC directly to get what you want.
That may require an Request for Enhancement to be made to the RHC SCP command.
File that here: https://github.com/openshift/rhc/issues
However you can use the following to find all of your GEARS:
rhc app show APP_NAME --gears | awk '{print $5}' | tail -n +3
From this list you can list all the logs for each gear that are part of that application.
for url in $(rhc app show APP_NAME --gears | awk '{print $5}' | tail -n +3); do for dir in $(ssh $url "ls -R | grep -i log.*:"); do echo -n $url:${dir%?}; echo; done; done
With that you can us simple scp commands to get the files you need from all of the gears:
for file_dir in $(for url in $(rhc app show APP_NAME --gears | awk '{print $5}' | tail -n +3); do for dir in $(ssh $url "ls -R | grep -i log.*:"); do echo -n $url:${dir%?}; echo; done; done); do scp "$file_dir/*" .; done
If you need to download any files, you can use an SFTP client like FileZilla, so you can copy files from the server.
I know it's been a while since the original question was posted, but I just bumped into the same issue today and found that you can use the scp command directly if you know the gear SSH URL:
scp local_file user#gear_ssh:remote_file
to upload a file to the gear, or
scp user#gear_ssh:remote_file local_file
to download from the gear.

How to copy the environment variables in cluster system using qsub?

I use the SUN's SGE to submit my jobs into a cluster system. The problem is how to let the
computing machine find the environment variables in the host machine, or how to config the qsub script to make the computing machine load the environment variables in host machine?
The following is an script example, but it will say some errors, such as libraries not found:
#!/bin/bash
#
#$ -V
#$ -cwd
#$ -j y
#$ -o /home/user/jobs_log/$JOB_ID.out
#$ -e /home/user/jobs_log/$JOB_ID.err
#$ -S /bin/bash
#
echo "Starting job: $SGE_TASK_ID"
# Modify this to use the path to matlab for your system
/home/user/Matlab/bin/matlab -nojvm -nodisplay -r matlab_job
echo "Done with job: $SGE_TASK_ID"
The technique you are using (adding a -V) should work. One possibility since you are specifying the shell with -S is that grid engine is configured to launch /bin/bash as a login shell and your profile scripts are stomping all over the environment you are trying to pass to the job.
Try using qstat -xml -j on the job while it is queued/running to see what environment variables grid engine is trying to pass to the job.
Try adding an env command to the script to see what variables are set.
Try adding shopt -q login_shell;echo $? in the script to tell you if it is being run as a login shell.
To list out shells that are configured as login shells in grid engine try:
SGE_SINGLE_LINE=true qconf -sconf|grep ^login_shells
I think this issue is due to you didn't config BASH in the login_shells of SGE
check your login_shells by qconf -sconf and see if bash in there.
login_shells
UNIX command interpreters like the Bourne-Shell (see sh(1)) or the C-
Shell (see csh(1)) can be used by Grid Engine to start job scripts. The
command interpreters can either be started as login-shells (i.e. all
system and user default resource files like .login or .profile will be
executed when the command interpreter is started and the environment
for the job will be set up as if the user has just logged in) or just
for command execution (i.e. only shell specific resource files like
.cshrc will be executed and a minimal default environment is set up by
Grid Engine - see qsub(1)). The parameter login_shells contains a
comma separated list of the executable names of the command inter-
preters to be started as login-shells. Shells in this list are only
started as login shells if the parameter shell_start_mode (see above)
is set to posix_compliant.
Changes to login_shells will take immediate effect. The default for
login_shells is sh,csh,tcsh,ksh.
This value is a global configuration parameter only. It cannot be over-
written by the execution host local configuration.