I am trying to emulate a firmware image using qemu. During booting, I get the following error
can't run '/etc/init.d/rcS': No such file or directory
can't open /dev/ttyS0: No such file or directory
can't open /dev/ttyS0: No such file or directory
can't open /dev/ttyS0: No such file or directory
.
.
.
This is the content of the inittab file
# Startup the system
null::sysinit:/etc/init.d/rc.sysinit
# now run any rc scripts
::sysinit:/etc/init.d/rcS
# Put a getty on the serial port
ttyS0::respawn:/sbin/getty -L ttyS0 115200 vt100
# Stuff to do before rebooting
null::shutdown:/bin/umount -a -r
It is able to run the rc.sysinit, but not the rcS.
I have checked permissions of the rcS. Also, the filesystem is mounted as read-only cramfs. Could this be causing an issue?
This is the command I am running:
QEMU_AUDIO_DRV=none \qemu-system-arm -m 256M -M versatilepb
-kernel ~/linux-2.6.23/arch/arm/boot/zImage
-append "console=ttyAMA0,115200 root=/dev/ram rdinit=/sbin/init"
-initrd ~/tmpcramfs2
-nographic
These are the boot messages obtained on running the command:
Linux version 2.6.23 (hsailer#SvanteArrhenius) (gcc version 4.0.2) #1 Thu May 27 09:31:10 EDT 2021
CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=00093177
Machine: ARM-Versatile PB
Memory policy: ECC disabled, Data cache writeback
CPU0: D VIVT write-through cache
CPU0: I cache: 4096 bytes, associativity 4, 32 byte lines, 32 sets
CPU0: D cache: 65536 bytes, associativity 4, 32 byte lines, 512 sets
Built 1 zonelists in Zone order. Total pages: 65024
Kernel command line: console=ttyAMA0,115200 root=/dev/ram rdinit=/sbin/init
PID hash table entries: 1024 (order: 10, 4096 bytes)
Console: colour dummy device 80x30
Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
Memory: 256MB = 256MB total
Memory: 249600KB available (2508K code, 227K data, 100K init)
Mount-cache hash table entries: 512
CPU: Testing write buffer coherency: ok
NET: Registered protocol family 16
NET: Registered protocol family 2
Time: timer3 clocksource has been installed.
IP route cache hash table entries: 2048 (order: 1, 8192 bytes)
TCP established hash table entries: 8192 (order: 4, 65536 bytes)
TCP bind hash table entries: 8192 (order: 3, 32768 bytes)
TCP: Hash tables configured (established 8192 bind 8192)
TCP reno registered
checking if image is initramfs...it isn't (bad gzip magic numbers); looks like an initrd
Freeing initrd memory: 7184K
NetWinder Floating Point Emulator V0.97 (double precision)
Installing knfsd (copyright (C) 1996 okir#monad.swb.de).
JFFS2 version 2.2. (NAND) © 2001-2006 Red Hat, Inc.
JFS: nTxBlock = 2007, nTxLock = 16063
io scheduler noop registered
io scheduler anticipatory registered (default)
io scheduler deadline registered
io scheduler cfq registered
CLCD: Versatile hardware, VGA display
Clock CLCDCLK: setting VCO reg params: S=1 R=99 V=98
Console: switching to colour frame buffer device 80x60
Serial: AMBA PL011 UART driver
dev:f1: ttyAMA0 at MMIO 0x101f1000 (irq = 12) is a AMBA/PL011
console [ttyAMA0] enabled
dev:f2: ttyAMA1 at MMIO 0x101f2000 (irq = 13) is a AMBA/PL011
dev:f3: ttyAMA2 at MMIO 0x101f3000 (irq = 14) is a AMBA/PL011
fpga:09: ttyAMA3 at MMIO 0x10009000 (irq = 38) is a AMBA/PL011
RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize
smc91x.c: v1.1, sep 22 2004 by Nicolas Pitre <nico#cam.org>
eth0: SMC91C11xFD (rev 1) at d098e000 IRQ 25 [nowait]
eth0: Ethernet addr: 52:54:00:12:34:56
armflash.0: Found 1 x32 devices at 0x0 in 32-bit bank
Intel/Sharp Extended Query Table at 0x0031
Using buffer write method
RedBoot partition parsing not available
afs partition parsing not available
armflash: probe of armflash.0 failed with error -22
mice: PS/2 mouse device common for all mice
input: AT Raw Set 2 keyboard as /class/input/input0
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
VFP support v0.3: implementor 41 architecture 1 part 10 variant 9 rev 0
input: ImExPS/2 Generic Explorer Mouse as /class/input/input1
RAMDISK: cramfs filesystem found at block 0
RAMDISK: Loading 7184KiB [1 disk] into ram disk... done.
VFS: Mounted root (cramfs filesystem) readonly.
Freeing init memory: 100K
can't run '/etc/init.d/rcS': No such file or directory
can't open /dev/ttyS0: No such file or directory
can't open /dev/ttyS0: No such file or directory
can't open /dev/ttyS0: No such file or directory
.
.
.
The errors about /dev/ttyS0 are because your inittab is specifying the wrong device name for the serial port for the (emulated) hardware you're running on. Your QEMU command specifies the 'versatilepb' board, whose serial devices are PL011s, which appear in /dev/ as /dev/ttyAMA0, /dev/ttyAMA1, etc. (/dev/ttyS0 is what the serial ports on an x86 PC appear as.) You need to fix that line of the inittab to refer to ttyAMA0 instead.
For the rcS error, I would suggest you start by double-checking all the things listed in all the responses to this older question.
I'm using tcpdump to capture TCP packet from port 3306 which is forwarded to MySQL server
sudo tcpdump -X -i ens5 -s 0 -tttt dst port 3306
and executed SQL select * from user_trading_volume limit 1 from MySQL client
the captured result is below
2020-05-27 07:46:44.330084 IP ip-10-0-1-33.ap-northeast-2.compute.internal.59750 > ip-10-30-1-179.ap-northeast-2.compute.internal.mysql: Flags [P.], seq 1945:2020, ack 16715, win 512, options [nop,nop,TS val 3790143765 ecr 4258512397], length 75
0x0000: 4500 007f 54fb 4000 4006 ce8c 0a00 0121 E...T.#.#......!
0x0010: 0a1e 01b3 e966 0cea 76a0 9245 c975 2466 .....f..v..E.u$f
0x0020: 8018 0200 1763 0000 0101 080a e1e9 0115 .....c..........
0x0030: fdd3 be0d 1703 0300 46f5 525d 17c9 20ac ........F.R]....
0x0040: 62e6 fcdc ba82 11fc 91c2 c187 7ca8 a542 b...........|..B
0x0050: 6ed8 a1fa b1d8 01bd 1240 61d9 686e 183d n........#a.hn.=
0x0060: f2fc 9b9a a62d c212 8d4d e1c6 e67a 4bdc .....-...M...zK.
0x0070: ea2e 75dc 68cf 5c45 1721 2ced c511 ca ..u.h.\E.!,....
2020-05-27 07:46:44.331029 IP ip-10-0-1-33.ap-northeast-2.compute.internal.59750 > ip-10-30-1-179.ap-northeast-2.compute.internal.mysql: Flags [.], ack 17677, win 505, options [nop,nop,TS val 3790143766 ecr 4258513778], length 0
0x0000: 4500 0034 54fc 4000 4006 ced6 0a00 0121 E..4T.#.#......!
0x0010: 0a1e 01b3 e966 0cea 76a0 9290 c975 2828 .....f..v....u((
0x0020: 8010 01f9 1718 0000 0101 080a e1e9 0116 ................
0x0030: fdd3 c372
but the captured packet was not readable (Which means not ASCII)
I'm using AWS aurora (mysql 5.7)
Does anyone knows what this packet means?
PS.
I tried it in my local environment too and could retrieve matching SQL from packet as below
(run mysql within docker container and executed query through mysql workbench)
16:59:46.628631 IP (tos 0x0, ttl 64, id 59587, offset 0, flags [DF], proto TCP (6), length 98)
view-localhost.52652 > view-localhost.3318: Flags [P.], cksum 0xfe56 (incorrect -> 0x1538), seq 61:107, ack 899, win 512, options [nop,nop,TS val 632447157 ecr 632447154], length 46
E..b..#.#.S...............#....=.....V.....
%.`.%.`.*....select * from user_trading_volume limit 1
Looking at the first byte, this looks like two raw IP packets (45 => IP version 4, typical 20byte header (5 * 4 bytes). Wikipedia has more info on IP headers.
Converting to pcap
Thus, we should be able to convert this back to a pcap. We can convert this text dump to a packet capture using text2pcap, which is a command line utility that ships with Wireshark.
With the given text as file temp, we can convert it into a pcap
$ cat temp | grep -v 2020 | cut -c3-49 | sed 's/ \(\w\w\)/ \1 /g' \
| text2pcap -l 101 - temp.pcap
Input from: Standard input
Output to: temp.pcap
Output format: pcap
Wrote packet of 127 bytes.
Wrote packet of 52 bytes.
Read 2 potential packets, wrote 2 packets (235 bytes).
Sanitizing text2pcap input
Here, we sanitize input so that text2pcap doesn't fail:
grev -v 2020: remove the 2020... info lines
cut -c3-49: Remove the preceding 0x and ASCII representation
sed 's/ \(\w\w\)/ \1 /g': Convert hexdump from 2 bytes then space to 1 byte then space (09ab => 09 ab)
text2pcap -l 101 - temp.cap: Read from stdin and write to temp.pcap as Raw IP packets (see below)
You can now view this capture in Wireshark to see what the fields are.
Figuring out the linklayer number for text2pcap
Going back to the initial byte, that byte starts the IP layer when normally a link layer like Ethernet starts the packet. That means that we can't use the typical link layer of 1 (Ethernet). The link layer for raw IP is 101, so we need to specify that with text2pcap as -l 101. - is standard input, and then we write the file as temp.pcap.
What does the packet mean?
When loaded in Wireshark, packet 1 has a payload of 75 bytes, and it's not ASCII. You will probably want to manually decode these bytes using the MySQL protocol reference. Because according to docs,
The MySQL protocol is used between MySQL Clients and a MySQL Server.
you can using tcpdump capture the data, and then using option -w redirect it to files.
then using wireshark to load it.
https://www.wireshark.org/docs/wsug_html_chunked/AppToolstcpdump.html
How to create FiWare instance and connect it to internet?
I like the idea and I have big plans on using this infrastructure, but...
I've trying to create instance and make ssh connection to it for some time now.
Created key-pair
Created security group (22,3306,1)
Created instance ubuntu 14 (also tried others)
Also tried ubuntu 12, POI and others already
Added node-int-net-01 and node-int-noinet-net-02 to it when creating
Also tried already with 1 network only
Allocated floating IP
Associated it with the local IP that came from "node-int-net-01"
Statuses:
Instance: ACTIVE, Power State RUNNING
"node-int-net-01" networks in list: shared-subnet 192.168.192.0/18 Yes ACTIVE UP
Inside "node-int-net-01":
Network: Admin State: DOWN, Shared: No, External Network: No
Subnet: DHCP and all ok
Ports: Status: BUILD, Admin State: UP
The confusing parts are (for clue, don't have to answer those if we have solution):
How can network be EXTERNAL-SHARED-ACTIVE-UP and DOWN-NOT_SHARED-NO_EXTERNAL at the same time - perhaps there's an error
What means Port status: BUILD, i mean it must have been building the port like 3 days already. Should i build there something, is it an order or status? Perhaps it means BUILT or BUILDING instead.
What means instance ACTIVE? Is it still active (busy) and i should wait? Or it can be actively used already? From VM Display I never saw it going to unix prompt>, is it kind of fiware itself using this telnet instance? I rather saw things like
"request error",
"connection timeout",
"socket.error",
"Error 101 Network is unreachable".
"cloud-init-nonet [13:31]: waiting 120 seconds for network device"
numerous black-screens and never ending Booting from hard-disk
from Instance log saw endless: "Waiting for network configuration", but that one was cured
Thou i saw "localhost login prompt, but as i only created PEM, then
cant imagine what to do with it - where do i get root/pwd? But i guess it was some error that it ended up there.
The latest status from Instance\Log is:
cloud-init-nonet[4.52]: static networking is now up
* Starting configure network device[74G[ OK ]
* Starting Mount network filesystems[74G[ OK ]
* Stopping Mount network filesystems[74G[ OK ]
* Stopping cold plug devices[74G[ OK ]
* Stopping log initial device creation[74G[ OK ]
* Starting enable remaining boot-time encrypted block devices[74G[ OK ]
Cloud-init v. 0.7.5 running 'init' at Sat, 16 Apr 2016 01:23:11 +0000. Up 5.07 seconds.
ci-info: ++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++
ci-info: +--------+------+-----------------+---------------+-------------------+
ci-info: | Device | Up | Address | Mask | Hw-Address |
ci-info: +--------+------+-----------------+---------------+-------------------+
ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | . |
ci-info: | eth0 | True | 192.168.242.127 | 255.255.192.0 | fa:16:3e:7a:47:94 |
ci-info: +--------+------+-----------------+---------------+-------------------+
ci-info: +++++++++++++++++++++++++++++++++Route info++++++++++++++++++++++++++++++++++
ci-info: +-------+---------------+---------------+---------------+-----------+-------+
ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
ci-info: +-------+---------------+---------------+---------------+-----------+-------+
ci-info: | 0 | 0.0.0.0 | 192.168.192.1 | 0.0.0.0 | eth0 | UG |
ci-info: | 1 | 192.168.192.0 | 0.0.0.0 | 255.255.192.0 | eth0 | U |
ci-info: +-------+---------------+---------------+---------------+-----------+-------+
For a ping and ssh i get: "Destination Host Unreachable" and "No route to host"
Also tried allocating floating IP with "federation" pool, but with that IP i just got time-outs for ping and ssh
I read already:
wiki
fiware help
stackoverflow
Followed also the steps in this slideshow http://www.slideshare.net/fermingalan/developing-your-first-application-using-fi-ware-20130903
http://cosmos.lab.fi-ware.org/cosmos-gui/ seems to be down
EDIT: can use this one (need to use https and accept bad cert)
http://forge.fiware.org/plugins/mediawiki/wiki/fiware/index.php/FIWARE.OpenSpecification.Data.BigData_R4#Basic_concepts
http://catalogue.fiware.org/enablers/bigdata-analysis-cosmos/documentation - no info about it neither.
Any ideas? Perhaps there is an UI (other than the web page at https://cloud.lab.fiware.org/ that seems to be in early beta) for using FiWare (that can do all the "anyway-mandatory" steps for users (developers)?
Maybe the problem is that I'm a software developer not network administrator, and perhaps this interface is meant for linux network andministrators.
The message "Error 101 Network is unreachable" shows that there was a problem in the VM network. node-int-net-01 is the shared network to be joined with the public network, while node-int-noinet-net-02 is to be joined with a network to use VPN. You shouldn't use both networks in the same VM, just you should use node-int-net-01.
The code messages like BUILD, ACTIVE and so on, are codes belonging to Openstack.
Regarding ping, you should open the icmp port in the security port to allow it.
Anyway, if you continue having problems, you can send a mail to FIWARE Lab support fiware-lab-help#lists.fiware.org, indicating your concrete data.
I made two virtual machines using VirtulBox based on Ubuntu Server 12.04.4 LTS, one with APACHE-MYSQL and another with NGINX-MYSQL. The installed versions are those taken from the standard repository. Now I want tested a local site (the same on the two machines) by using AB. I noted very different results regarding the total time to complete the operation. So these are my results:
APACHE-MYSQL:
ab -n 100 -c 10 http://www.myrestsite.com/device/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking www.myrestsite.com (be patient).....done
Server Software: Apache/2.2.22
Server Hostname: www.myrestsite.com
Server Port: 80
Document Path: /device/
Document Length: 220 bytes
Concurrency Level: 10
Time taken for tests: 50.285 seconds
Complete requests: 100
Failed requests: 17
(Connect: 0, Receive: 0, Length: 17, Exceptions: 0)
Write errors: 0
Total transferred: 39801 bytes
HTML transferred: 22001 bytes
Requests per second: 1.99 [#/sec] (mean)
Time per request: 5028.547 [ms] (mean)
Time per request: 502.855 [ms] (mean, across all concurrent requests)
Transfer rate: 0.77 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 2.3 0 9
Processing: 5008 5023 25.6 5014 5127
Waiting: 4 17 25.3 8 121
Total: 5008 5024 26.3 5014 5128
Percentage of the requests served within a certain time (ms)
50% 5014
66% 5017
75% 5021
80% 5025
90% 5060
95% 5091
98% 5127
99% 5128
100% 5128 (longest request)
NGINX-MYSQL:
ab -n 100 -c 10 http://www.myrestsite.com/device/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking www.myrestsite.com (be patient).....done
Server Software: nginx/1.1.19
Server Hostname: www.myrestsite.com
Server Port: 80
Document Path: /device/
Document Length: 221 bytes
Concurrency Level: 10
Time taken for tests: 0.579 seconds
Complete requests: 100
Failed requests: 90
(Connect: 0, Receive: 0, Length: 90, Exceptions: 0)
Write errors: 0
Total transferred: 38598 bytes
HTML transferred: 21998 bytes
Requests per second: 172.75 [#/sec] (mean)
Time per request: 57.887 [ms] (mean)
Time per request: 5.789 [ms] (mean, across all concurrent requests)
Transfer rate: 65.11 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.3 0 2
Processing: 17 56 87.0 27 333
Waiting: 17 56 86.9 27 333
Total: 17 57 87.3 27 334
Percentage of the requests served within a certain time (ms)
50% 27
66% 30
75% 34
80% 39
90% 271
95% 322
98% 331
99% 334
100% 334 (longest request)
It's normal that the same request needs 50.285 seconds on APACHE and 0.579 seconds on NGINX? Is possible such a difference between them? Thanks.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 22 days ago.
The community reviewed whether to reopen this question 22 days ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I'm trying to monitor a process that uses CUDA and MPI, is there any way I could do this, something like the command "top" but that monitors the GPU too?
To get real-time insight on used resources, do:
nvidia-smi -l 1
This will loop and call the view at every second.
If you do not want to keep past traces of the looped call in the console history, you can also do:
watch -n0.1 nvidia-smi
Where 0.1 is the time interval, in seconds.
I find gpustat very useful. It can be installed with pip install gpustat, and prints breakdown of usage by processes or users.
I'm not aware of anything that combines this information, but you can use the nvidia-smi tool to get the raw data, like so (thanks to #jmsu for the tip on -l):
$ nvidia-smi -q -g 0 -d UTILIZATION -l
==============NVSMI LOG==============
Timestamp : Tue Nov 22 11:50:05 2011
Driver Version : 275.19
Attached GPUs : 2
GPU 0:1:0
Utilization
Gpu : 0 %
Memory : 0 %
Recently, I have written a monitoring tool called nvitop, the interactive NVIDIA-GPU process viewer.
It is written in pure Python and is easy to install.
Install from PyPI:
pip3 install --upgrade nvitop
Install the latest version from GitHub (recommended):
pip3 install git+https://github.com/XuehaiPan/nvitop.git#egg=nvitop
Run as a resource monitor:
nvitop -m
nvitop will show the GPU status like nvidia-smi but with additional fancy bars and history graphs.
For the processes, it will use psutil to collect process information and display the USER, %CPU, %MEM, TIME and COMMAND fields, which is much more detailed than nvidia-smi. Besides, it is responsive for user inputs in monitor mode. You can interrupt or kill your processes on the GPUs.
nvitop comes with a tree-view screen and an environment screen:
In addition, nvitop can be integrated into other applications. For example, integrate into PyTorch training code:
import os
from nvitop.core import host, CudaDevice, HostProcess, GpuProcess
from torch.utils.tensorboard import SummaryWriter
device = CudaDevice(0)
this_process = GpuProcess(os.getpid(), device)
writer = SummaryWriter()
for epoch in range(n_epochs):
# some training code here
# ...
this_process.update_gpu_status()
writer.add_scalars(
'monitoring',
{
'device/memory_used': float(device.memory_used()) / (1 << 20), # convert bytes to MiBs
'device/memory_percent': device.memory_percent(),
'device/memory_utilization': device.memory_utilization(),
'device/gpu_utilization': device.gpu_utilization(),
'host/cpu_percent': host.cpu_percent(),
'host/memory_percent': host.virtual_memory().percent,
'process/cpu_percent': this_process.cpu_percent(),
'process/memory_percent': this_process.memory_percent(),
'process/used_gpu_memory': float(this_process.gpu_memory()) / (1 << 20), # convert bytes to MiBs
'process/gpu_sm_utilization': this_process.gpu_sm_utilization(),
'process/gpu_memory_utilization': this_process.gpu_memory_utilization(),
},
global_step
)
See https://github.com/XuehaiPan/nvitop for more details.
Note: nvitop is dual-licensed by the GPLv3 License and Apache-2.0 License. Please feel free to use it as a dependency for your own projects. See Copyright Notice for more details.
Just use watch nvidia-smi, it will output the message by 2s interval in default.
For example, as the below image:
You can also use watch -n 5 nvidia-smi (-n 5 by 5s interval).
Use argument "--query-compute-apps="
nvidia-smi --query-compute-apps=pid,process_name,used_memory --format=csv
for further help, please follow
nvidia-smi --help-query-compute-app
You can try nvtop, which is similar to the widely-used htop tool but for NVIDIA GPUs. Here is a screenshot of nvtop of it in action.
Download and install latest stable CUDA driver (4.2) from here. On linux, nVidia-smi 295.41 gives you just what you want. use nvidia-smi:
[root#localhost release]# nvidia-smi
Wed Sep 26 23:16:16 2012
+------------------------------------------------------+
| NVIDIA-SMI 3.295.41 Driver Version: 295.41 |
|-------------------------------+----------------------+----------------------+
| Nb. Name | Bus Id Disp. | Volatile ECC SB / DB |
| Fan Temp Power Usage /Cap | Memory Usage | GPU Util. Compute M. |
|===============================+======================+======================|
| 0. Tesla C2050 | 0000:05:00.0 On | 0 0 |
| 30% 62 C P0 N/A / N/A | 3% 70MB / 2687MB | 44% Default |
|-------------------------------+----------------------+----------------------|
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0. 7336 ./align 61MB |
+-----------------------------------------------------------------------------+
EDIT: In latest NVIDIA drivers, this support is limited to Tesla Cards.
Another useful monitoring approach is to use ps filtered on processes that consume your GPUs. I use this one a lot:
ps f -o user,pgrp,pid,pcpu,pmem,start,time,command -p `lsof -n -w -t /dev/nvidia*`
That'll show all nvidia GPU-utilizing processes and some stats about them. lsof ... retrieves a list of all processes using an nvidia GPU owned by the current user, and ps -p ... shows ps results for those processes. ps f shows nice formatting for child/parent process relationships / hierarchies, and -o specifies a custom formatting. That one is similar to just doing ps u but adds the process group ID and removes some other fields.
One advantage of this over nvidia-smi is that it'll show process forks as well as main processes that use the GPU.
One disadvantage, though, is it's limited to processes owned by the user that executes the command. To open it up to all processes owned by any user, I add a sudo before the lsof.
Lastly, I combine it with watch to get a continuous update. So, in the end, it looks like:
watch -n 0.1 'ps f -o user,pgrp,pid,pcpu,pmem,start,time,command -p `sudo lsof -n -w -t /dev/nvidia*`'
Which has output like:
Every 0.1s: ps f -o user,pgrp,pid,pcpu,pmem,start,time,command -p `sudo lsof -n -w -t /dev/nvi... Mon Jun 6 14:03:20 2016
USER PGRP PID %CPU %MEM STARTED TIME COMMAND
grisait+ 27294 50934 0.0 0.1 Jun 02 00:01:40 /opt/google/chrome/chrome --type=gpu-process --channel=50877.0.2015482623
grisait+ 27294 50941 0.0 0.0 Jun 02 00:00:00 \_ /opt/google/chrome/chrome --type=gpu-broker
grisait+ 53596 53596 36.6 1.1 13:47:06 00:05:57 python -u process_examples.py
grisait+ 53596 33428 6.9 0.5 14:02:09 00:00:04 \_ python -u process_examples.py
grisait+ 53596 33773 7.5 0.5 14:02:19 00:00:04 \_ python -u process_examples.py
grisait+ 53596 34174 5.0 0.5 14:02:30 00:00:02 \_ python -u process_examples.py
grisait+ 28205 28205 905 1.5 13:30:39 04:56:09 python -u train.py
grisait+ 28205 28387 5.8 0.4 13:30:49 00:01:53 \_ python -u train.py
grisait+ 28205 28388 5.3 0.4 13:30:49 00:01:45 \_ python -u train.py
grisait+ 28205 28389 4.5 0.4 13:30:49 00:01:29 \_ python -u train.py
grisait+ 28205 28390 4.5 0.4 13:30:49 00:01:28 \_ python -u train.py
grisait+ 28205 28391 4.8 0.4 13:30:49 00:01:34 \_ python -u train.py
This may not be elegant, but you can try
while true; do sleep 2; nvidia-smi; done
I also tried the method by #Edric, which works, but I prefer the original layout of nvidia-smi.
You can use the monitoring program glances with its GPU monitoring plug-in:
open source
to install: sudo apt-get install -y python-pip; sudo pip install glances[gpu]
to launch: sudo glances
It also monitors the CPU, disk IO, disk space, network, and a few other things:
In Linux Mint, and most likely Ubuntu, you can try "nvidia-smi --loop=1"
If you just want to find the process which is running on gpu, you can simply using the following command:
lsof /dev/nvidia*
For me nvidia-smi and watch -n 1 nvidia-smi are enough in most cases. Sometimes nvidia-smi shows no process but the gpu memory is used up so i need to use the above command to find the processes.
I created a batch file with the following code in a windows machine to monitor every second. It works for me.
:loop
cls
"C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi"
timeout /T 1
goto loop
nvidia-smi exe is usually located in "C:\Program Files\NVIDIA Corporation" if you want to run the command only once.
you can use nvidia-smi pmon -i 0 to monitor every process in GPU 0.
including compute mode, sm usage, memory usage, encoder usage, decoder usage.
There is Prometheus GPU Metrics Exporter (PGME) that leverages the nvidai-smi binary. You may try this out. Once you have the exporter running, you can access it via http://localhost:9101/metrics. For two GPUs, the sample result looks like this:
temperature_gpu{gpu="TITAN X (Pascal)[0]"} 41
utilization_gpu{gpu="TITAN X (Pascal)[0]"} 0
utilization_memory{gpu="TITAN X (Pascal)[0]"} 0
memory_total{gpu="TITAN X (Pascal)[0]"} 12189
memory_free{gpu="TITAN X (Pascal)[0]"} 12189
memory_used{gpu="TITAN X (Pascal)[0]"} 0
temperature_gpu{gpu="TITAN X (Pascal)[1]"} 78
utilization_gpu{gpu="TITAN X (Pascal)[1]"} 95
utilization_memory{gpu="TITAN X (Pascal)[1]"} 59
memory_total{gpu="TITAN X (Pascal)[1]"} 12189
memory_free{gpu="TITAN X (Pascal)[1]"} 1738
memory_used{gpu="TITAN X (Pascal)[1]"} 10451
Run nvidia-smi in device monitoring mode, e.g.:
$ nvidia-smi dmon -d 3 -s pcvumt
# gpu pwr gtemp mtemp mclk pclk pviol tviol sm mem enc dec fb bar1 rxpci txpci
# Idx W C C MHz MHz % bool % % % % MB MB MB/s MB/s
0 273 54 - 9501 2025 0 0 100 11 0 0 18943 75 5906 659
0 280 54 - 9501 2025 0 0 100 11 0 0 18943 75 7404 650
0 277 54 - 9501 2025 0 0 100 11 0 0 18943 75 7386 719
0 279 55 - 9501 2025 0 0 99 11 0 0 18945 75 6592 692
0 281 55 - 9501 2025 0 0 99 11 0 0 18945 75 7760 641
0 279 55 - 9501 2025 0 0 99 11 0 0 18945 75 7775 668
0 279 55 - 9501 2025 0 0 100 11 0 0 18947 75 7589 690
0 281 55 - 9501 2025 0 0 99 12 0 0 18947 75 7514 657
0 279 55 - 9501 2025 0 0 100 11 0 0 18947 75 6472 558
0 280 54 - 9501 2025 0 0 100 11 0 0 18947 75 7066 683
Full details are in man nvidia-smi.