502 Proxy Error from OpenShift DIY project - openshift

On my Openshift account I have setup Tomcat 8 and JDK 8 on a DIY application with the MySql and PHPAdmin cartridges installed.
My war file points to everything correctly and there are no errors on startup in any of the logs. However, when I try to go to my OpenShift URL I receive this 502 Proxy Error in the browser. I'm using Chrome.
The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request GET /.
What could be causing this problem?

#Graham Where's the fun in that? So I'm going to share my experience, in case anybody else gets here. I think in my instance I was hitting the upper limit of authorized CPU / memory usage for my 'free' gear. Nothing really jumps out and yells "You hit the limit" but It was pretty clear something was wrong. I'm pretty happy with the results, glad I stuck it out. I've learned a whole lot about deployment to an online server with meager $$ resources.
General troubleshooting instructions start here.
First, I shut down the server hard with a $rhc app-force-stop <app_name> After that I was able to start up the system again and it would work fine. In my case I was trying to do too much with the size of server I was paying for (free!) The free server includes 512Mb Ram and 1 Gig storage. I was trying to run Node, a MongoDB and a Cron cartridge in there. Additionally I had a whole lot of asynchronous Input/Output with quite a large stack built up. In hind sight, not clever.
Error detection wasn't real easy. I didn't learn anything at all from the log files. Generally when something went wrong they just stopped recording anything at all.
There are 11 tests to do. First login to the server via SSH, and your command line tool. Note, there is no magic "you screwed up here message" You've got to look at your usage, and compare it to your authorized usage levels. So yeah, this took me awhile, but I documented this for my own notes. Here's a good place to share with others. I've learned a whole lot with this exercise. Good luck. (oh and in my case, I deleted the cron cartridge and the mongodb cartridge. I'm hosting the DB at mlab.com where its accessible from my other projects. Success for me .)
1) Memory Fail Counts: (results should be zero...)
oo-cgroup-read memory.failcnt // my results --> 160031
oo-cgroup-read memory.memsw.failcnt // my resluts --> 8572
2) Check disk Quotas
[xyz-abc.rhcloud.com 5xxx3]\> quota -s
Disk quotas for user 5xxx3 (uid 3488):
Filesystem blocks quota limit grace files quota limit grace
/dev/mapper/EBSStore01-user_home01
608M 0 1024M 12664 0 80000
3) Check for your actual disk usage. (du = Disk Usage
Sum of directories (-s) in human-readable format (-h : Byte, Kilobyte, Megabyte, Gigabyte, Terabyte and Petabyte): )
du -sh ~
du: cannot read directory `/var/lib/openshift/5xxx3/.tmp': Permission denied
du: cannot read directory `/var/lib/openshift/5xxx3/.sandbox': Permission denied
du: cannot read directory `/var/lib/openshift/5xxx3/.ssh': Permission denied
du: cannot read directory `/var/lib/openshift/5xxx3/.gearstats': Permission denied
607M /var/lib/openshift/5xxx3/
4) List open files (lsof is a command meaning "list open files", which is used in many Unix-like systems to report a list of all open
files and the processes that opened them. -n Do not resolve hostnames (no DNS). -P Do not resolve port
names (list port number instead of its name). )
lsof -n -P
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
mongod 471639 3488 11u IPv4 423798423 0t0 TCP 127.x.y.z:27017 (LISTEN)
node 475151 3488 10u IPv4 423815802 0t0 TCP 127.x.y.z:8080 (LISTEN)
5) Display top CPU intensive processes (top Provide information (frequently refreshed) about the most CPU-intensive processes currently running. You do not
need to include a - before options. -b Run in batch mode; don't accept command-line input. Useful for sending
output to another command or to a file. -n num Update display num times, then exit.)
top -b -n 1
top - 00:48:37 up 13 days, 23:52, 0 users, load average: 2.91, 2.27, 2.09
Tasks: 13 total, 1 running, 12 sleeping, 0 stopped, 0 zombie
Cpu(s): 11.6%us, 10.0%sy, 0.1%ni, 77.5%id, 0.5%wa, 0.0%hi, 0.2%si, 0.1%st
Mem: 15297608k total, 14537912k used, 759696k free, 36456k buffers
Swap: 52428792k total, 16372136k used, 36056656k free, 2720680k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
60898 3488 20 0 12800 968 744 R 1.9 0.0 0:00.02 top
55776 3488 20 0 106m 2740 808 S 0.0 0.0 0:00.00 sshd
55779 3488 20 0 104m 2260 1432 S 0.0 0.0 0:00.09 bash
432471 3488 20 0 106m 888 884 S 0.0 0.0 0:00.00 sshd
432475 3488 20 0 55144 1540 1536 S 0.0 0.0 0:00.11 sftp-server
471611 3488 20 0 9508 412 404 S 0.0 0.0 0:00.00 control
471612 3488 20 0 181m 2152 1720 S 0.0 0.0 0:00.01 logshifter
471624 3488 20 0 4072 456 448 S 0.0 0.0 0:00.00 scl
471625 3488 20 0 9236 812 808 S 0.0 0.0 0:00.00 bash
471639 3488 20 0 373m 14m 13m S 0.0 0.1 0:03.53 mongod
475123 3488 20 0 778m 5264 5172 S 0.0 0.0 0:00.08 node
475124 3488 20 0 117m 2148 1708 S 0.0 0.0 0:00.00 logshifter
475151 3488 20 0 863m 114m 6776 S 0.0 0.8 0:04.10 node
6) Review memory usage. (free -- Display statistics about memory usage: total free, used, physical, swap, shared, and buffers used by the kernel.
Options: -b Calculate memory in bytes. -k Default. Calculate memory in kilobytes. -m Calculate memory in megabytes.)
free
total used free shared buffers cached
Mem: 15297608 14767896 529712 766468 36484 2746820
-/+ buffers/cache: 11984592 3313016
Swap: 52428792 16334312 36094480
This is where I've gone astray. There is still a tiny bit of free space, but it doesn't take me much to figure out when I'm doing an intensive I/O that I'm going to go south fast here. When that happened I didn't see any error log / messages at all. Things just stop working.
7) Check your sockets. (ss - socket statistics. The output will contain all tcp, udp and unix socket connection details. )
ss
State Recv-Q Send-Q Local Address:Port Peer Address:Port
(in this case there are no open sockets.. the line above is just the column headers..)
8) Check VMstat. (vmstat – Summary information of Memory, Processes, Paging etc. Free – Amount of free/idle memory spaces.
si – Swapped in every second from disk in Kilo Bytes. so – Swapped out every second to disk in Kilo Bytes. )
vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 16248996 425248 33476 2946912 88 90 321 247 4 3 12 10 78 0 0
9) Check I/O stats. (iostat – Central Processing Unit (CPU) statistics and input/output statistics for devices and partitions.)
iostat
Linux 2.6.32-573.12.1.el6.x86_64 (ex-std-node842.prod.rhcloud.com) 03/14/2016 _x86_64_ (4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
11.60 0.12 10.21 0.49 0.06 77.52
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
xvda 6.56 197.23 147.83 238703267 178916836
xvdf 15.08 337.29 347.44 408209376 420504392
xvdg 15.13 337.45 347.44 408413143 420502512
xvdp 65.18 1603.17 1060.59 1940282568 1283607613
dm-0 7.97 108.87 33.25 131768290 40238544
dm-1 70.00 1574.18 1060.36 1905191416 1283329611
dm-2 3.48 87.89 114.58 106366791 138678084
10) (mpstat - Report processors related statistics. )
mpstat
Linux 2.6.32-573.12.1.el6.x86_64 (ex-std-node842.prod.rhcloud.com) 03/14/2016 _x86_64_ (4 CPU)
01:10:59 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
01:10:59 AM all 11.60 0.12 10.01 0.49 0.00 0.21 0.06 0.00 77.52
11) User Limits (ulimit User limits - limit the use of system-wide resources. -a All current limits are reported. )
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 59663
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 350
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Related

can't run '/etc/init.d/rcS': No such file or directory

I am trying to emulate a firmware image using qemu. During booting, I get the following error
can't run '/etc/init.d/rcS': No such file or directory
can't open /dev/ttyS0: No such file or directory
can't open /dev/ttyS0: No such file or directory
can't open /dev/ttyS0: No such file or directory
.
.
.
This is the content of the inittab file
# Startup the system
null::sysinit:/etc/init.d/rc.sysinit
# now run any rc scripts
::sysinit:/etc/init.d/rcS
# Put a getty on the serial port
ttyS0::respawn:/sbin/getty -L ttyS0 115200 vt100
# Stuff to do before rebooting
null::shutdown:/bin/umount -a -r
It is able to run the rc.sysinit, but not the rcS.
I have checked permissions of the rcS. Also, the filesystem is mounted as read-only cramfs. Could this be causing an issue?
This is the command I am running:
QEMU_AUDIO_DRV=none \qemu-system-arm -m 256M -M versatilepb
-kernel ~/linux-2.6.23/arch/arm/boot/zImage
-append "console=ttyAMA0,115200 root=/dev/ram rdinit=/sbin/init"
-initrd ~/tmpcramfs2
-nographic
These are the boot messages obtained on running the command:
Linux version 2.6.23 (hsailer#SvanteArrhenius) (gcc version 4.0.2) #1 Thu May 27 09:31:10 EDT 2021
CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=00093177
Machine: ARM-Versatile PB
Memory policy: ECC disabled, Data cache writeback
CPU0: D VIVT write-through cache
CPU0: I cache: 4096 bytes, associativity 4, 32 byte lines, 32 sets
CPU0: D cache: 65536 bytes, associativity 4, 32 byte lines, 512 sets
Built 1 zonelists in Zone order. Total pages: 65024
Kernel command line: console=ttyAMA0,115200 root=/dev/ram rdinit=/sbin/init
PID hash table entries: 1024 (order: 10, 4096 bytes)
Console: colour dummy device 80x30
Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
Memory: 256MB = 256MB total
Memory: 249600KB available (2508K code, 227K data, 100K init)
Mount-cache hash table entries: 512
CPU: Testing write buffer coherency: ok
NET: Registered protocol family 16
NET: Registered protocol family 2
Time: timer3 clocksource has been installed.
IP route cache hash table entries: 2048 (order: 1, 8192 bytes)
TCP established hash table entries: 8192 (order: 4, 65536 bytes)
TCP bind hash table entries: 8192 (order: 3, 32768 bytes)
TCP: Hash tables configured (established 8192 bind 8192)
TCP reno registered
checking if image is initramfs...it isn't (bad gzip magic numbers); looks like an initrd
Freeing initrd memory: 7184K
NetWinder Floating Point Emulator V0.97 (double precision)
Installing knfsd (copyright (C) 1996 okir#monad.swb.de).
JFFS2 version 2.2. (NAND) © 2001-2006 Red Hat, Inc.
JFS: nTxBlock = 2007, nTxLock = 16063
io scheduler noop registered
io scheduler anticipatory registered (default)
io scheduler deadline registered
io scheduler cfq registered
CLCD: Versatile hardware, VGA display
Clock CLCDCLK: setting VCO reg params: S=1 R=99 V=98
Console: switching to colour frame buffer device 80x60
Serial: AMBA PL011 UART driver
dev:f1: ttyAMA0 at MMIO 0x101f1000 (irq = 12) is a AMBA/PL011
console [ttyAMA0] enabled
dev:f2: ttyAMA1 at MMIO 0x101f2000 (irq = 13) is a AMBA/PL011
dev:f3: ttyAMA2 at MMIO 0x101f3000 (irq = 14) is a AMBA/PL011
fpga:09: ttyAMA3 at MMIO 0x10009000 (irq = 38) is a AMBA/PL011
RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize
smc91x.c: v1.1, sep 22 2004 by Nicolas Pitre <nico#cam.org>
eth0: SMC91C11xFD (rev 1) at d098e000 IRQ 25 [nowait]
eth0: Ethernet addr: 52:54:00:12:34:56
armflash.0: Found 1 x32 devices at 0x0 in 32-bit bank
Intel/Sharp Extended Query Table at 0x0031
Using buffer write method
RedBoot partition parsing not available
afs partition parsing not available
armflash: probe of armflash.0 failed with error -22
mice: PS/2 mouse device common for all mice
input: AT Raw Set 2 keyboard as /class/input/input0
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
VFP support v0.3: implementor 41 architecture 1 part 10 variant 9 rev 0
input: ImExPS/2 Generic Explorer Mouse as /class/input/input1
RAMDISK: cramfs filesystem found at block 0
RAMDISK: Loading 7184KiB [1 disk] into ram disk... done.
VFS: Mounted root (cramfs filesystem) readonly.
Freeing init memory: 100K
can't run '/etc/init.d/rcS': No such file or directory
can't open /dev/ttyS0: No such file or directory
can't open /dev/ttyS0: No such file or directory
can't open /dev/ttyS0: No such file or directory
.
.
.
The errors about /dev/ttyS0 are because your inittab is specifying the wrong device name for the serial port for the (emulated) hardware you're running on. Your QEMU command specifies the 'versatilepb' board, whose serial devices are PL011s, which appear in /dev/ as /dev/ttyAMA0, /dev/ttyAMA1, etc. (/dev/ttyS0 is what the serial ports on an x86 PC appear as.) You need to fix that line of the inittab to refer to ttyAMA0 instead.
For the rcS error, I would suggest you start by double-checking all the things listed in all the responses to this older question.

WordPress Database Keeps crashing frequently

I have a website that is using WordPress + WooCommerce to manage an e-commerce website. Right now we are using a plugin called: "WP All Import" with the WooCommerce Add-on to Import from a .CSV file all the product data (SKU, Title, Description, Price, Image link, etc).
So the problem is that when we run the import it frequently crashes giving error message
This is the error that keeps showing
We asked to our host and they answer with the following:
"
We are sorry for the server issues. The website requests are causing the MariaDB to allocate all CPU resources and making the server restart to kill the processes
ov 25 13:29:35 server mysqld: 2020-11-25 13:29:35 140531759913152 [Note] /usr/sbin/mysqld (mysqld 10.2.36-MariaDB) starting as process 22045 ...
Nov 25 13:29:35 server mysqld: 2020-11-25 13:29:35 140531759913152 [Warning] Could not increase number of max_open_files to more than 524288 (request: 524423)
top - 13:33:21 up 1:02, 2 users, load average: 9.48, 10.41, 10.28
Tasks: 145 total, 24 running, 119 sleeping, 0 stopped, 2 zombie
%Cpu(s): 85.3 us, 14.5 sy, 0.0 ni, 0.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 4194304 total, 1292680 free, 2622632 used, 278992 buff/cache
KiB Swap: 524288 total, 338664 free, 185624 used. 1390602 avail Mem
"
Checking with tools like GTMetrix, we are not as good as we have to be in performance, getting an "E" score with the most important thing to change the amount of DOM elements (Now aprox. 2100)
Thanks in advance

Overload due to apache process and Mysql process

I have a small site running and only 20 suppliers used to access this sites for queries. The server is running on high load during the peak hours. Please find the output below:
top - 10:15:42 up 32 days, 20:08, 4 users, load average: 2.20, 2.06, 1.94
Tasks: 500 total, 1 running, 498 sleeping, 0 stopped, 1 zombie
Cpu(s): 7.1%us, 2.3%sy, 0.0%ni, 90.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32931056k total, 3124852k used, 29806204k free, 49508k buffers
Swap: 3999740k total, 0k used, 3999740k free, 1364836k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10130 mysql 20 0 6207m 567m 5468 S 232 1.8 14306:04 mysqld
27534 worldsto 20 0 307m 20m 5364 S 5 0.1 0:01.97 apache2
29237 worldsto 20 0 299m 12m 3696 S 2 0.0 0:00.07 apache2
29003 worldsto 20 0 299m 13m 3716 S 1 0.0 0:00.12 apache2
root#server70:~# ps -ef | grep apache | wc
434 2368 17756
CPU(s): 24
RAM size: 32 GB
From what I have seen from the Apache logs, all the connections are coming from suppliers and company IP addresses. I am sure there is something wrong with the Apache process so that MYSQL is using more CPU load.
Please someone help me to identify and fix this problem. Thanks
The best troubleshooting step you can do is this:
connect to your MySQL server process, and type:
SHOW FULL PROCESSLIST
That will show you every query that's running. You will probably see the same query showing up multiple times, perhaps with different ID's - maybe something like:
SELECT * FROM foo WHERE fooid='1'
SELECT * FROM foo WHERE fooid='2'
...etc...
That means you need an index on 'fooid'.

NTFS/GPT Mount exited with Exit Code 13

This is a duplicated post since I didn't get any help on askubuntu.com.
I have a 1TB external hard drive that I recently formatted to NTFS. It was mounting on my Ubuntu 11.10 fine until just now. I didn't make any changes to affect my OS or my exhdd.
The error that I get is:
Error mounting: mount exited with exit code 13: $MFTMirr does not match $MFT (record 0).
Failed to mount '/dev/sdb2': Input/output error
NTFS is either inconsistent, or there is a hardware fault, or it's a
SoftRAID/FakeRAID hardware. In the first case run chkdsk /f on Windows
then reboot into Windows twice. The usage of the /f parameter is very
important! If the device is a SoftRAID/FakeRAID then first activate
it and mount a different device under the /dev/mapper/ directory, (e.g.
/dev/mapper/nvidia_eahaabcc1). Please see the 'dmraid' documentation
for more details.
I did read this and this. But neither helped.
I tried installing ntfsfix but no such package exists anymore.
I have never used this HDD on a windows machine. If I need to use an other machine to do stuff to fix this, I have access to a mac.
Any advice?
This is my sudo fdisk -l output:
What in the world is GPT? I didn't do that. It used to be NTFS.
Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000586fb
Device Boot Start End Blocks Id System
/dev/sda1 * 2148 961320312 480659082+ 83 Linux
/dev/sda2 961320313 976773167 7726427+ 5 Extended
/dev/sda5 961320314 976773167 7726427 83 Linux
WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted.
Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xcfd88605
Device Boot Start End Blocks Id System
/dev/sdb1 1 1953525167 976762583+ ee GPT
This is the thing that worked:
I first needed to get ntfs-3g (sudo apt-get install ntfs-3g)
Run sudo fdisk -l to figure out where the mount point is. Mine was /dev/sdb1
I ran ntfsfix -b /dev/sdb1 and that fixed the problem.
Error mounting: mount exited with exit code 13: $MFTMirr does not match $MFT (record 0). Failed to mount '/dev/sda1': Input/output error
NTFS is either inconsistent, or there is a hardware fault, or it's a SoftRAID/FakeRAID hardware. In the first case run chkdsk /f on Windows then reboot into Windows twice. The usage of the /f parameter is very important! If the device is a SoftRAID/FakeRAID then first activate it and mount a different device under the /dev/mapper/ directory, (e.g. /dev/mapper/nvidia_eahaabcc1).
Please see the 'dmraid' documentation for more details.
Solution :-
sudo fdisk -l
sudo ntfsfix /dev/select_disk_name
To find Disk name:
Go dashboard -> Disk utility -> Click disk -> then show Device /Dev/***

A top-like utility for monitoring CUDA activity on a GPU [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 22 days ago.
The community reviewed whether to reopen this question 22 days ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I'm trying to monitor a process that uses CUDA and MPI, is there any way I could do this, something like the command "top" but that monitors the GPU too?
To get real-time insight on used resources, do:
nvidia-smi -l 1
This will loop and call the view at every second.
If you do not want to keep past traces of the looped call in the console history, you can also do:
watch -n0.1 nvidia-smi
Where 0.1 is the time interval, in seconds.
I find gpustat very useful. It can be installed with pip install gpustat, and prints breakdown of usage by processes or users.
I'm not aware of anything that combines this information, but you can use the nvidia-smi tool to get the raw data, like so (thanks to #jmsu for the tip on -l):
$ nvidia-smi -q -g 0 -d UTILIZATION -l
==============NVSMI LOG==============
Timestamp : Tue Nov 22 11:50:05 2011
Driver Version : 275.19
Attached GPUs : 2
GPU 0:1:0
Utilization
Gpu : 0 %
Memory : 0 %
Recently, I have written a monitoring tool called nvitop, the interactive NVIDIA-GPU process viewer.
It is written in pure Python and is easy to install.
Install from PyPI:
pip3 install --upgrade nvitop
Install the latest version from GitHub (recommended):
pip3 install git+https://github.com/XuehaiPan/nvitop.git#egg=nvitop
Run as a resource monitor:
nvitop -m
nvitop will show the GPU status like nvidia-smi but with additional fancy bars and history graphs.
For the processes, it will use psutil to collect process information and display the USER, %CPU, %MEM, TIME and COMMAND fields, which is much more detailed than nvidia-smi. Besides, it is responsive for user inputs in monitor mode. You can interrupt or kill your processes on the GPUs.
nvitop comes with a tree-view screen and an environment screen:
In addition, nvitop can be integrated into other applications. For example, integrate into PyTorch training code:
import os
from nvitop.core import host, CudaDevice, HostProcess, GpuProcess
from torch.utils.tensorboard import SummaryWriter
device = CudaDevice(0)
this_process = GpuProcess(os.getpid(), device)
writer = SummaryWriter()
for epoch in range(n_epochs):
# some training code here
# ...
this_process.update_gpu_status()
writer.add_scalars(
'monitoring',
{
'device/memory_used': float(device.memory_used()) / (1 << 20), # convert bytes to MiBs
'device/memory_percent': device.memory_percent(),
'device/memory_utilization': device.memory_utilization(),
'device/gpu_utilization': device.gpu_utilization(),
'host/cpu_percent': host.cpu_percent(),
'host/memory_percent': host.virtual_memory().percent,
'process/cpu_percent': this_process.cpu_percent(),
'process/memory_percent': this_process.memory_percent(),
'process/used_gpu_memory': float(this_process.gpu_memory()) / (1 << 20), # convert bytes to MiBs
'process/gpu_sm_utilization': this_process.gpu_sm_utilization(),
'process/gpu_memory_utilization': this_process.gpu_memory_utilization(),
},
global_step
)
See https://github.com/XuehaiPan/nvitop for more details.
Note: nvitop is dual-licensed by the GPLv3 License and Apache-2.0 License. Please feel free to use it as a dependency for your own projects. See Copyright Notice for more details.
Just use watch nvidia-smi, it will output the message by 2s interval in default.
For example, as the below image:
You can also use watch -n 5 nvidia-smi (-n 5 by 5s interval).
Use argument "--query-compute-apps="
nvidia-smi --query-compute-apps=pid,process_name,used_memory --format=csv
for further help, please follow
nvidia-smi --help-query-compute-app
You can try nvtop, which is similar to the widely-used htop tool but for NVIDIA GPUs. Here is a screenshot of nvtop of it in action.
Download and install latest stable CUDA driver (4.2) from here. On linux, nVidia-smi 295.41 gives you just what you want. use nvidia-smi:
[root#localhost release]# nvidia-smi
Wed Sep 26 23:16:16 2012
+------------------------------------------------------+
| NVIDIA-SMI 3.295.41 Driver Version: 295.41 |
|-------------------------------+----------------------+----------------------+
| Nb. Name | Bus Id Disp. | Volatile ECC SB / DB |
| Fan Temp Power Usage /Cap | Memory Usage | GPU Util. Compute M. |
|===============================+======================+======================|
| 0. Tesla C2050 | 0000:05:00.0 On | 0 0 |
| 30% 62 C P0 N/A / N/A | 3% 70MB / 2687MB | 44% Default |
|-------------------------------+----------------------+----------------------|
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0. 7336 ./align 61MB |
+-----------------------------------------------------------------------------+
EDIT: In latest NVIDIA drivers, this support is limited to Tesla Cards.
Another useful monitoring approach is to use ps filtered on processes that consume your GPUs. I use this one a lot:
ps f -o user,pgrp,pid,pcpu,pmem,start,time,command -p `lsof -n -w -t /dev/nvidia*`
That'll show all nvidia GPU-utilizing processes and some stats about them. lsof ... retrieves a list of all processes using an nvidia GPU owned by the current user, and ps -p ... shows ps results for those processes. ps f shows nice formatting for child/parent process relationships / hierarchies, and -o specifies a custom formatting. That one is similar to just doing ps u but adds the process group ID and removes some other fields.
One advantage of this over nvidia-smi is that it'll show process forks as well as main processes that use the GPU.
One disadvantage, though, is it's limited to processes owned by the user that executes the command. To open it up to all processes owned by any user, I add a sudo before the lsof.
Lastly, I combine it with watch to get a continuous update. So, in the end, it looks like:
watch -n 0.1 'ps f -o user,pgrp,pid,pcpu,pmem,start,time,command -p `sudo lsof -n -w -t /dev/nvidia*`'
Which has output like:
Every 0.1s: ps f -o user,pgrp,pid,pcpu,pmem,start,time,command -p `sudo lsof -n -w -t /dev/nvi... Mon Jun 6 14:03:20 2016
USER PGRP PID %CPU %MEM STARTED TIME COMMAND
grisait+ 27294 50934 0.0 0.1 Jun 02 00:01:40 /opt/google/chrome/chrome --type=gpu-process --channel=50877.0.2015482623
grisait+ 27294 50941 0.0 0.0 Jun 02 00:00:00 \_ /opt/google/chrome/chrome --type=gpu-broker
grisait+ 53596 53596 36.6 1.1 13:47:06 00:05:57 python -u process_examples.py
grisait+ 53596 33428 6.9 0.5 14:02:09 00:00:04 \_ python -u process_examples.py
grisait+ 53596 33773 7.5 0.5 14:02:19 00:00:04 \_ python -u process_examples.py
grisait+ 53596 34174 5.0 0.5 14:02:30 00:00:02 \_ python -u process_examples.py
grisait+ 28205 28205 905 1.5 13:30:39 04:56:09 python -u train.py
grisait+ 28205 28387 5.8 0.4 13:30:49 00:01:53 \_ python -u train.py
grisait+ 28205 28388 5.3 0.4 13:30:49 00:01:45 \_ python -u train.py
grisait+ 28205 28389 4.5 0.4 13:30:49 00:01:29 \_ python -u train.py
grisait+ 28205 28390 4.5 0.4 13:30:49 00:01:28 \_ python -u train.py
grisait+ 28205 28391 4.8 0.4 13:30:49 00:01:34 \_ python -u train.py
This may not be elegant, but you can try
while true; do sleep 2; nvidia-smi; done
I also tried the method by #Edric, which works, but I prefer the original layout of nvidia-smi.
You can use the monitoring program glances with its GPU monitoring plug-in:
open source
to install: sudo apt-get install -y python-pip; sudo pip install glances[gpu]
to launch: sudo glances
It also monitors the CPU, disk IO, disk space, network, and a few other things:
In Linux Mint, and most likely Ubuntu, you can try "nvidia-smi --loop=1"
If you just want to find the process which is running on gpu, you can simply using the following command:
lsof /dev/nvidia*
For me nvidia-smi and watch -n 1 nvidia-smi are enough in most cases. Sometimes nvidia-smi shows no process but the gpu memory is used up so i need to use the above command to find the processes.
I created a batch file with the following code in a windows machine to monitor every second. It works for me.
:loop
cls
"C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi"
timeout /T 1
goto loop
nvidia-smi exe is usually located in "C:\Program Files\NVIDIA Corporation" if you want to run the command only once.
you can use nvidia-smi pmon -i 0 to monitor every process in GPU 0.
including compute mode, sm usage, memory usage, encoder usage, decoder usage.
There is Prometheus GPU Metrics Exporter (PGME) that leverages the nvidai-smi binary. You may try this out. Once you have the exporter running, you can access it via http://localhost:9101/metrics. For two GPUs, the sample result looks like this:
temperature_gpu{gpu="TITAN X (Pascal)[0]"} 41
utilization_gpu{gpu="TITAN X (Pascal)[0]"} 0
utilization_memory{gpu="TITAN X (Pascal)[0]"} 0
memory_total{gpu="TITAN X (Pascal)[0]"} 12189
memory_free{gpu="TITAN X (Pascal)[0]"} 12189
memory_used{gpu="TITAN X (Pascal)[0]"} 0
temperature_gpu{gpu="TITAN X (Pascal)[1]"} 78
utilization_gpu{gpu="TITAN X (Pascal)[1]"} 95
utilization_memory{gpu="TITAN X (Pascal)[1]"} 59
memory_total{gpu="TITAN X (Pascal)[1]"} 12189
memory_free{gpu="TITAN X (Pascal)[1]"} 1738
memory_used{gpu="TITAN X (Pascal)[1]"} 10451
Run nvidia-smi in device monitoring mode, e.g.:
$ nvidia-smi dmon -d 3 -s pcvumt
# gpu pwr gtemp mtemp mclk pclk pviol tviol sm mem enc dec fb bar1 rxpci txpci
# Idx W C C MHz MHz % bool % % % % MB MB MB/s MB/s
0 273 54 - 9501 2025 0 0 100 11 0 0 18943 75 5906 659
0 280 54 - 9501 2025 0 0 100 11 0 0 18943 75 7404 650
0 277 54 - 9501 2025 0 0 100 11 0 0 18943 75 7386 719
0 279 55 - 9501 2025 0 0 99 11 0 0 18945 75 6592 692
0 281 55 - 9501 2025 0 0 99 11 0 0 18945 75 7760 641
0 279 55 - 9501 2025 0 0 99 11 0 0 18945 75 7775 668
0 279 55 - 9501 2025 0 0 100 11 0 0 18947 75 7589 690
0 281 55 - 9501 2025 0 0 99 12 0 0 18947 75 7514 657
0 279 55 - 9501 2025 0 0 100 11 0 0 18947 75 6472 558
0 280 54 - 9501 2025 0 0 100 11 0 0 18947 75 7066 683
Full details are in man nvidia-smi.