how to fix CUDA cuInit: Unknown CUDA error value? - cuda

I using colab on blender rendering GPU
suddenly it appear warning
how i fix it out?
CUDA cuInit: Unknown CUDA error value
F0523 17:11:21.172250 499 device.cpp:402] Device does not support queues.
*** Check failure stack trace: ***
# 0x2b6cefd (unknown)
# 0x2b6ee03 (unknown)
# 0x2b6ca8d (unknown)
# 0x2b6f6d9 (unknown)
# 0x2b8e120 ccl::Device::gpu_queue_create()
# 0x3390629 ccl::PathTraceWorkGPU::PathTraceWorkGPU()
# 0x338ad04 ccl::PathTraceWork::create()
# 0x33843e2 (unknown)
# 0x33846c6 ccl::PathTrace::PathTrace()
# 0x331b6e1 ccl::Session::Session()
# 0x2b0cdd5 ccl::BlenderSession::create_session()
# 0x2b0d990 ccl::BlenderSession::reset_session()
# 0x2b0579d (unknown)
# 0x9eca874 (unknown)
# 0x9e84cd8 _PyObject_MakeTpCall
# 0x112b66f _PyEval_EvalFrameDefault
# 0x9f43364 (unknown)
# 0x112b61a _PyEval_EvalFrameDefault
# 0x9f43364 (unknown)
# 0x9e8495f PyVectorcall_Call
# 0x1a4d828 (unknown)
# 0x19c0d66 (unknown)
# 0x977aec9 (unknown)
# 0x977bb04 RE_engine_render
# 0x977f637 (unknown)
# 0x97826a0 (unknown)
# 0x978325c RE_RenderAnim
# 0x112e010 (unknown)
# 0xb26e407 BLI_args_parse
# 0x101c46b main
# 0x7f34eb952c87 __libc_start_main
# 0x112d97c (unknown)

cuInit() initializes the CUDA driver. It's not about what your device supports or doesn't support. Check your NVIDIA driver installation and whether your version of libcuda corresponds to it. In particualr, run nvidia-smi to check which driver is loaded, if at all, and which GPUs are visible.

Related

virt-install and qemu-system-aarch64: cannot create vmnet interface: general failure (possibly not enough privileges)

I'm trying to virt-install the following:
sudo virt-install \ 1
--name host1 \
--memory 2048 \
--vcpus 2 \
--disk size=30 \
--cdrom ./box.img \
--os-variant ubuntu22.04 \
--virt-type hvf \
--qemu-commandline='-M highmem=off -netdev vmnet-shared,id=net0 -device virtio-net-device,netdev=net0,mac=54:54:00:55:54:51' \
--network user
and I got the following error:
ERROR internal error: process exited while connecting to monitor: 2023-01-12T01:08:04.782892Z qemu-system-aarch64: -netdev vmnet-shared,id=net0: cannot create vmnet interface: general failure (possibly not enough privileges)
I've tried to run both libvirtd manually and via the brew services, and I got the same error.
# when I run as a local user
/opt/homebrew/opt/libvirt/sbin/libvirtd -f /opt/homebrew/etc/libvirt/libvirtd.conf
# via homebrew services
◼ ~ $ brew services
Name Status User File
libvirt started root ~/Library/LaunchAgents/homebrew.mxcl.libvirt.plist
and this is the libvirtd.conf:
# Master libvirt daemon configuration file
#
#################################################################
#
# Network connectivity controls
#
# Flag listening for secure TLS connections on the public TCP/IP port.
#
# To enable listening sockets with the 'libvirtd' daemon it's also required to
# pass the '--listen' flag on the commandline of the daemon.
# This is not needed with 'virtproxyd'.
#
# This setting is not required or honoured if using systemd socket
# activation.
#
# It is necessary to setup a CA and issue server certificates before
# using this capability.
#
# This is enabled by default, uncomment this to disable it
#listen_tls = 0
# Listen for unencrypted TCP connections on the public TCP/IP port.
#
# To enable listening sockets with the 'libvirtd' daemon it's also required to
# pass the '--listen' flag on the commandline of the daemon.
# This is not needed with 'virtproxyd'.
#
# This setting is not required or honoured if using systemd socket
# activation.
#
# Using the TCP socket requires SASL authentication by default. Only
# SASL mechanisms which support data encryption are allowed. This is
# DIGEST_MD5 and GSSAPI (Kerberos5)
#
# This is disabled by default, uncomment this to enable it.
#listen_tcp = 1
# Override the port for accepting secure TLS connections
# This can be a port number, or service name
#
# This setting is not required or honoured if using systemd socket
# activation.
#
#tls_port = "16514"
# Override the port for accepting insecure TCP connections
# This can be a port number, or service name
#
# This setting is not required or honoured if using systemd socket
# activation.
#
#tcp_port = "16509"
# Override the default configuration which binds to all network
# interfaces. This can be a numeric IPv4/6 address, or hostname
#
# This setting is not required or honoured if using systemd socket
# activation.
#
# If the libvirtd service is started in parallel with network
# startup (e.g. with systemd), binding to addresses other than
# the wildcards (0.0.0.0/::) might not be available yet.
#
#listen_addr = "192.168.0.1"
#################################################################
#
# UNIX socket access controls
#
# Set the UNIX domain socket group ownership. This can be used to
# allow a 'trusted' set of users access to management capabilities
# without becoming root.
#
# This setting is not required or honoured if using systemd socket
# activation.
#
# This is restricted to 'root' by default.
#unix_sock_group = "libvirt"
# Set the UNIX socket permissions for the R/O socket. This is used
# for monitoring VM status only
#
# This setting is not required or honoured if using systemd socket
# activation.
#
# Default allows any user. If setting group ownership, you may want to
# restrict this too.
unix_sock_ro_perms = "0777"
# Set the UNIX socket permissions for the R/W socket. This is used
# for full management of VMs
#
# This setting is not required or honoured if using systemd socket
# activation.
#
# Default allows only root. If PolicyKit is enabled on the socket,
# the default will change to allow everyone (eg, 0777)
#
# If not using PolicyKit and setting group ownership for access
# control, then you may want to relax this too.
unix_sock_rw_perms = "0770"
# Set the UNIX socket permissions for the admin interface socket.
#
# This setting is not required or honoured if using systemd socket
# activation.
#
# Default allows only owner (root), do not change it unless you are
# sure to whom you are exposing the access to.
unix_sock_admin_perms = "0700"
# Set the name of the directory in which sockets will be found/created.
#
# This setting is not required or honoured if using systemd socket
# activation.
#
unix_sock_dir = "/opt/homebrew/var/run/libvirt"
#################################################################
#
# Authentication.
#
# There are the following choices available:
#
# - none: do not perform auth checks. If you can connect to the
# socket you are allowed. This is suitable if there are
# restrictions on connecting to the socket (eg, UNIX
# socket permissions), or if there is a lower layer in
# the network providing auth (eg, TLS/x509 certificates)
#
# - sasl: use SASL infrastructure. The actual auth scheme is then
# controlled from /opt/homebrew/etc/sasl2/libvirt.conf. For the TCP
# socket only GSSAPI & DIGEST-MD5 mechanisms will be used.
# For non-TCP or TLS sockets, any scheme is allowed.
#
# - polkit: use PolicyKit to authenticate. This is only suitable
# for use on the UNIX sockets. The default policy will
# require a user to supply their own password to gain
# full read/write access (aka sudo like), while anyone
# is allowed read/only access.
#
# Set an authentication scheme for UNIX read-only sockets
#
# By default socket permissions allow anyone to connect
#
# If libvirt was compiled without support for 'polkit', then
# no access control checks are done, but libvirt still only
# allows execution of APIs which don't change state.
#
# If libvirt was compiled with support for 'polkit', then
# the libvirt socket will perform a check with polkit after
# connections. The default policy still allows any local
# user access.
#
# To restrict monitoring of domains you may wish to either
# enable 'sasl' here, or change the polkit policy definition.
#auth_unix_ro = "none"
# Set an authentication scheme for UNIX read-write sockets.
#
# If libvirt was compiled without support for 'polkit', then
# the systemd .socket files will use SocketMode=0600 by default
# thus only allowing root user to connect, and 'auth_unix_rw'
# will default to 'none'.
#
# If libvirt was compiled with support for 'polkit', then
# the systemd .socket files will use SocketMode=0666 which
# allows any user to connect and 'auth_unix_rw' will default
# to 'polkit'. If you disable use of 'polkit' here, then it
# is essential to change the systemd SocketMode parameter
# back to 0600, to avoid an insecure configuration.
#
#auth_unix_rw = "none"
# Change the authentication scheme for TCP sockets.
#
# If you don't enable SASL, then all TCP traffic is cleartext.
# Don't do this outside of a dev/test scenario. For real world
# use, always enable SASL and use the GSSAPI or DIGEST-MD5
# mechanism in /opt/homebrew/etc/sasl2/libvirt.conf
#auth_tcp = "sasl"
# Change the authentication scheme for TLS sockets.
#
# TLS sockets already have encryption provided by the TLS
# layer, and limited authentication is done by certificates
#
# It is possible to make use of any SASL authentication
# mechanism as well, by using 'sasl' for this option
#auth_tls = "none"
# Enforce a minimum SSF value for TCP sockets
#
# The default minimum is currently 56 (single-DES) which will
# be raised to 112 in the future.
#
# This option can be used to set values higher than 112
#tcp_min_ssf = 112
# Change the API access control scheme
#
# By default an authenticated user is allowed access
# to all APIs. Access drivers can place restrictions
# on this. By default the 'nop' driver is enabled,
# meaning no access control checks are done once a
# client has authenticated with libvirtd
#
#access_drivers = [ "polkit" ]
#################################################################
#
# TLS x509 certificate configuration
#
# Use of TLS requires that x509 certificates be issued. The default locations
# for the certificate files is as follows:
#
# /opt/homebrew/etc/pki/CA/cacert.pem - The CA master certificate
# /opt/homebrew/etc/pki/libvirt/servercert.pem - The server certificate signed by cacert.pem
# /opt/homebrew/etc/pki/libvirt/private/serverkey.pem - The server private key
#
# It is possible to override the default locations by altering the 'key_file',
# 'cert_file', and 'ca_file' values and uncommenting them below.
#
# NB, overriding the default of one location requires uncommenting and
# possibly additionally overriding the other settings.
#
# Override the default server key file path
#
#key_file = "/opt/homebrew/etc/pki/libvirt/private/serverkey.pem"
# Override the default server certificate file path
#
#cert_file = "/opt/homebrew/etc/pki/libvirt/servercert.pem"
# Override the default CA certificate path
#
#ca_file = "/opt/homebrew/etc/pki/CA/cacert.pem"
# Specify a certificate revocation list.
#
# Defaults to not using a CRL, uncomment to enable it
#crl_file = "/opt/homebrew/etc/pki/CA/crl.pem"
#################################################################
#
# Authorization controls
#
# Flag to disable verification of our own server certificates
#
# When libvirtd starts it performs some sanity checks against
# its own certificates.
#
# Default is to always run sanity checks. Uncommenting this
# will disable sanity checks which is not a good idea
#tls_no_sanity_certificate = 1
# Flag to disable verification of client certificates
#
# Client certificate verification is the primary authentication mechanism.
# Any client which does not present a certificate signed by the CA
# will be rejected.
#
# Default is to always verify. Uncommenting this will disable
# verification.
#tls_no_verify_certificate = 1
# An access control list of allowed x509 Distinguished Names
# This list may contain wildcards such as
#
# "C=GB,ST=London,L=London,O=Red Hat,CN=*"
#
# Any * matches any number of consecutive spaces, like a simplified glob(7).
#
# The format of the DN for a particular certificate can be queried
# using:
#
# virt-pki-query-dn clientcert.pem
#
# NB If this is an empty list, no client can connect, so comment out
# entirely rather than using empty list to disable these checks
#
# By default, no DN's are checked
#tls_allowed_dn_list = ["DN1", "DN2"]
# Override the compile time default TLS priority string. The
# default is usually "NORMAL" unless overridden at build time.
# Only set this is it is desired for libvirt to deviate from
# the global default settings.
#
#tls_priority="NORMAL"
# An access control list of allowed SASL usernames. The format for username
# depends on the SASL authentication mechanism. Kerberos usernames
# look like username#REALM
#
# This list may contain wildcards such as
#
# "*#EXAMPLE.COM"
#
# See the g_pattern_match function for the format of the wildcards.
#
# https://developer.gnome.org/glib/stable/glib-Glob-style-pattern-matching.html
#
# NB If this is an empty list, no client can connect, so comment out
# entirely rather than using empty list to disable these checks
#
# By default, no Username's are checked
#sasl_allowed_username_list = ["joe#EXAMPLE.COM", "fred#EXAMPLE.COM" ]
#################################################################
#
# Processing controls
#
# The maximum number of concurrent client connections to allow
# over all sockets combined.
#max_clients = 5000
# The maximum length of queue of connections waiting to be
# accepted by the daemon. Note, that some protocols supporting
# retransmission may obey this so that a later reattempt at
# connection succeeds.
#max_queued_clients = 1000
# The maximum length of queue of accepted but not yet
# authenticated clients. The default value is 20. Set this to
# zero to turn this feature off.
#max_anonymous_clients = 20
# The minimum limit sets the number of workers to start up
# initially. If the number of active clients exceeds this,
# then more threads are spawned, up to max_workers limit.
# Typically you'd want max_workers to equal maximum number
# of clients allowed
#min_workers = 5
#max_workers = 20
# The number of priority workers. If all workers from above
# pool are stuck, some calls marked as high priority
# (notably domainDestroy) can be executed in this pool.
#prio_workers = 5
# Limit on concurrent requests from a single client
# connection. To avoid one client monopolizing the server
# this should be a small fraction of the global max_workers
# parameter.
#max_client_requests = 5
# Same processing controls, but this time for the admin interface.
# For description of each option, be so kind to scroll few lines
# upwards.
#admin_min_workers = 1
#admin_max_workers = 5
#admin_max_clients = 5
#admin_max_queued_clients = 5
#admin_max_client_requests = 5
#################################################################
#
# Logging controls
#
# Logging level: 4 errors, 3 warnings, 2 information, 1 debug
# basically 1 will log everything possible
#
# WARNING: USE OF THIS IS STRONGLY DISCOURAGED.
#
# WARNING: It outputs too much information to practically read.
# WARNING: The "log_filters" setting is recommended instead.
#
# WARNING: Journald applies rate limiting of messages and so libvirt
# WARNING: will limit "log_level" to only allow values 3 or 4 if
# WARNING: journald is the current output.
#
# WARNING: USE OF THIS IS STRONGLY DISCOURAGED.
#log_level = 3
# Logging filters:
# A filter allows to select a different logging level for a given category
# of logs. The format for a filter is:
#
# level:match
#
# where 'match' is a string which is matched against the category
# given in the VIR_LOG_INIT() at the top of each libvirt source
# file, e.g., "remote", "qemu", or "util.json". The 'match' in the
# filter matches using shell wildcard syntax (see 'man glob(7)').
# The 'match' is always treated as a substring match. IOW a match
# string 'foo' is equivalent to '*foo*'.
#
# 'level' is the minimal level where matching messages should
# be logged:
#
# 1: DEBUG
# 2: INFO
# 3: WARNING
# 4: ERROR
#
# Multiple filters can be defined in a single #log_filters, they just need
# to be separated by spaces. Note that libvirt performs "first" match, i.e.
# if there are concurrent filters, the first one that matches will be applied,
# given the order in #log_filters.
#
# A typical need is to capture information from a hypervisor driver,
# public API entrypoints and some of the utility code. Some utility
# code is very verbose and is generally not desired. Taking the QEMU
# hypervisor as an example, a suitable filter string for debugging
# might be to turn off object, json & event logging, but enable the
# rest of the util code:
#
#log_filters="1:qemu 1:libvirt 4:object 4:json 4:event 1:util"
# Logging outputs:
# An output is one of the places to save logging information
# The format for an output can be:
# level:stderr
# output goes to stderr
# level:syslog:name
# use syslog for the output and use the given name as the ident
# level:file:file_path
# output to a file, with the given filepath
# level:journald
# output to journald logging system
# In all cases 'level' is the minimal priority, acting as a filter
# 1: DEBUG
# 2: INFO
# 3: WARNING
# 4: ERROR
#
# Multiple outputs can be defined, they just need to be separated by spaces.
# e.g. to log all warnings and errors to syslog under the libvirtd ident:
#log_outputs="3:syslog:libvirtd"
##################################################################
#
# Auditing
#
# This setting allows usage of the auditing subsystem to be altered:
#
# audit_level == 0 -> disable all auditing
# audit_level == 1 -> enable auditing, only if enabled on host (default)
# audit_level == 2 -> enable auditing, and exit if disabled on host
#
#audit_level = 2
#
# If set to 1, then audit messages will also be sent
# via libvirt logging infrastructure. Defaults to 0
#
#audit_logging = 1
###################################################################
# UUID of the host:
# Host UUID is read from one of the sources specified in host_uuid_source.
#
# - 'smbios': fetch the UUID from 'dmidecode -s system-uuid'
# - 'machine-id': fetch the UUID from /etc/machine-id
#
# The host_uuid_source default is 'smbios'. If 'dmidecode' does not provide
# a valid UUID a temporary UUID will be generated.
#
# Another option is to specify host UUID in host_uuid.
#
# Keep the format of the example UUID below. UUID must not have all digits
# be the same.
# NB This default all-zeros UUID will not work. Replace
# it with the output of the 'uuidgen' command and then
# uncomment this entry
#host_uuid = "00000000-0000-0000-0000-000000000000"
#host_uuid_source = "smbios"
###################################################################
# Keepalive protocol:
# This allows libvirtd to detect broken client connections or even
# dead clients. A keepalive message is sent to a client after
# keepalive_interval seconds of inactivity to check if the client is
# still responding; keepalive_count is a maximum number of keepalive
# messages that are allowed to be sent to the client without getting
# any response before the connection is considered broken. In other
# words, the connection is automatically closed approximately after
# keepalive_interval * (keepalive_count + 1) seconds since the last
# message received from the client. If keepalive_interval is set to
# -1, libvirtd will never send keepalive requests; however clients
# can still send them and the daemon will send responses. When
# keepalive_count is set to 0, connections will be automatically
# closed after keepalive_interval seconds of inactivity without
# sending any keepalive messages.
#
#keepalive_interval = 5
#keepalive_count = 5
#
# These configuration options are no longer used. There is no way to
# restrict such clients from connecting since they first need to
# connect in order to ask for keepalive.
#
#keepalive_required = 1
#admin_keepalive_required = 1
# Keepalive settings for the admin interface
#admin_keepalive_interval = 5
#admin_keepalive_count = 5
###################################################################
# Open vSwitch:
# This allows to specify a timeout for openvswitch calls made by
# libvirt. The ovs-vsctl utility is used for the configuration and
# its timeout option is set by default to 5 seconds to avoid
# potential infinite waits blocking libvirt.
#
#
Now, to make sure that was a privilege error, I ran:
qemu-system-aarch64 -netdev vmnet-shared,id=net0 -machine virt-2.10
which reproduces the error, but:
sudo qemu-system-aarch64 -netdev vmnet-shared,id=net0 -machine virt-2.10
opens a qemu window and I got:
qemu-system-aarch64: warning: netdev net0 has no peer
alright, how could I fix libvirt on Mac OS, installed via homebrew?
$ brew info libvirt 1
==> libvirt: stable 8.10.0 (bottled), HEAD
C virtualization API
https://libvirt.org/
/opt/homebrew/Cellar/libvirt/8.10.0 (587 files, 40.8MB) *
Poured from bottle on 2023-01-11 at 19:20:38
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/libvirt.rb
License: LGPL-2.1-or-later and GPL-2.0-or-later
==> Dependencies
Build: docutils ✘, meson ✘, ninja ✘, perl ✘, pkg-config ✔, python#3.11 ✔, rpcgen ✘
Required: gettext ✔, glib ✔, gnu-sed ✔, gnutls ✔, grep ✔, libgcrypt ✔, libiscsi ✔, libssh2 ✔, yajl ✔
==> Options
--HEAD
Install HEAD version
==> Caveats
To restart libvirt after an upgrade:
brew services restart libvirt
Or, if you don't want/need a background service you can just run:
/opt/homebrew/opt/libvirt/sbin/libvirtd -f /opt/homebrew/etc/libvirt/libvirtd.conf
==> Analytics
install: 4,452 (30 days), 18,333 (90 days), 69,415 (365 days)
install-on-request: 3,222 (30 days), 13,494 (90 days), 52,022 (365 days)
build-error: 4 (30 days)
Update
Maybe I've found the source of the problem here, if anyone could confirm

zabbix agent : cannot accept incoming connection for peer: frontend

Im running a zabbix agent on my server and i have this problem with it : when the server tries to connect to it i get the following error message in /var/log/zabbix/zabbix_agent2.log :
2022/06/30 18:35:38.627607 cannot accept incoming connection for peer: 172.16.238.2
2022/06/30 18:35:52.433324 [101] In refreshActiveChecks() from [172.16.239.40:10051]
2022/06/30 18:35:52.433379 connecting to [172.16.239.40:10051] [timeout:3s, connection timeout:3s]
2022/06/30 18:35:52.433616 sending [{"request":"active checks","host":"Zabbix server","version":"6.0"}] to [172.16.239.40:10051]
2022/06/30 18:35:52.433971 receiving data from [172.16.239.40:10051]
2022/06/30 18:35:52.451945 received [{"response":"success","data":[]}] from [172.16.239.40:10051]
2022/06/30 18:35:52.452089 [101] End of refreshActiveChecks() from [172.16.239.40:10051]
2022/06/30 18:35:52.452104 [101] processing update request (0 requests)
2022/06/30 18:35:52.452109 [101] skipping empty update for unregistered client
2022/06/30 18:36:38.672626 cannot accept incoming connection for peer: 172.16.238.2
You notice this is a problem with the frontend (172.16.238.2), but the backend is ok (172.16.239.40).
How can i resolve this ? I have tried to set DebugLevel=5 to have more details but it's the same.
Both agent and server are on the same host machine.
This is my conf file (pretty much default except server ip)
############ GENERAL PARAMETERS #################
### Option: PidFile
# Name of PID file.
#
# Mandatory: no
# Default:
# PidFile=/tmp/zabbix_agent2.pid
PidFile=/var/run/zabbix/zabbix_agent2.pid
DebugLevel=5
### Option: LogType
# Specifies where log messages are written to:
# system - syslog
# file - file specified with LogFile parameter
# console - standard output
#
# Mandatory: no
# Default:
# LogType=file
### Option: LogFile
# Log file name for LogType 'file' parameter.
#
# Mandatory: yes, if LogType is set to file, otherwise no
# Default:
# LogFile=/tmp/zabbix_agent2.log
LogFile=/var/log/zabbix/zabbix_agent2.log
### Option: LogFileSize
# Maximum size of log file in MB.
# 0 - disable automatic log rotation.
#
# Mandatory: no
# Range: 0-1024
# Default:
# LogFileSize=1
LogFileSize=0
### Option: DebugLevel
# Specifies debug level:
# 0 - basic information about starting and stopping of Zabbix processes
# 1 - critical information
# 2 - error information
# 3 - warnings
# 4 - for debugging (produces lots of information)
# 5 - extended debugging (produces even more information)
#
# Mandatory: no
# Range: 0-5
# Default:
# DebugLevel=3
### Option: SourceIP
# Source IP address for outgoing connections.
#
# Mandatory: no
# Default:
# SourceIP=
##### Passive checks related
### Option: Server
# List of comma delimited IP addresses, optionally in CIDR notation, or DNS names of Zabbix servers and Zabbix proxies.
# Incoming connections will be accepted only from the hosts listed here.
# If IPv6 support is enabled then '127.0.0.1', '::127.0.0.1', '::ffff:127.0.0.1' are treated equally
# and '::/0' will allow any IPv4 or IPv6 address.
# '0.0.0.0/0' can be used to allow any IPv4 address.
# Example: Server=172.16.239.40,192.168.1.0/24,::1,2001:db8::/32,zabbix.example.com
#
# Mandatory: yes, if StartAgents is not explicitly set to 0
# Default:
# Server=
Server=172.16.239.40
### Option: ListenPort
# Agent will listen on this port for connections from the server.
#
# Mandatory: no
# Range: 1024-32767
# Default:
# ListenPort=10050
### Option: ListenIP
# List of comma delimited IP addresses that the agent should listen on.
# First IP address is sent to Zabbix server if connecting to it to retrieve list of active checks.
#
# Mandatory: no
# Default:
# ListenIP=0.0.0.0
### Option: StatusPort
# Agent will listen on this port for HTTP status requests.
#
# Mandatory: no
# Range: 1024-32767
# Default:
# StatusPort=
##### Active checks related
### Option: ServerActive
# List of comma delimited IP addresses or DNS names (address:port) pairs or clusters (address:port;address2:port) of Zabbix servers and Zabbix proxies for active checks.
# If port is not specified, default port is used.
# Cluster nodes need be separated by semicolon.
# IPv6 addresses must be enclosed in square brackets if port for that host is specified.
# If port is not specified, square brackets for IPv6 addresses are optional.
# If this parameter is not specified, active checks are disabled.
# Example for multiple servers:
# ServerActive=172.16.239.40:20051,zabbix.domain,[::1]:30051,::1,[12fc::1]
# Example for HA:
# ServerActive=zabbix.cluster.node1;zabbix.cluster.node2:20051;zabbix.cluster.node3
# Example for HA with two clusters and one server:
# ServerActive=zabbix.cluster.node1;zabbix.cluster.node2:20051,zabbix.cluster2.node1;zabbix.cluster2.node2,zabbix.domain
#
# Mandatory: no
# Default:
# ServerActive=
ServerActive=172.16.239.40
### Option: Hostname
# List of comma delimited unique, case sensitive hostnames.
# Required for active checks and must match hostnames as configured on the server.
# Value is acquired from HostnameItem if undefined.
#
# Mandatory: no
# Default:
# Hostname=
Hostname=Zabbix server
### Option: HostnameItem
# Item used for generating Hostname if it is undefined. Ignored if Hostname is defined.
# Does not support UserParameters or aliases.
#
# Mandatory: no
# Default:
# HostnameItem=system.hostname
### Option: HostMetadata
# Optional parameter that defines host metadata.
# Host metadata is used at host auto-registration process.
# An agent will issue an error and not start if the value is over limit of 255 characters.
# If not defined, value will be acquired from HostMetadataItem.
#
# Mandatory: no
# Range: 0-255 characters
# Default:
# HostMetadata=
### Option: HostMetadataItem
# Optional parameter that defines an item used for getting host metadata.
# Host metadata is used at host auto-registration process.
# During an auto-registration request an agent will log a warning message if
# the value returned by specified item is over limit of 255 characters.
# This option is only used when HostMetadata is not defined.
#
# Mandatory: no
# Default:
# HostMetadataItem=
### Option: HostInterface
# Optional parameter that defines host interface.
# Host interface is used at host auto-registration process.
# An agent will issue an error and not start if the value is over limit of 255 characters.
# If not defined, value will be acquired from HostInterfaceItem.
#
# Mandatory: no
# Range: 0-255 characters
# Default:
# HostInterface=
### Option: HostInterfaceItem
# Optional parameter that defines an item used for getting host interface.
# Host interface is used at host auto-registration process.
# During an auto-registration request an agent will log a warning message if
# the value returned by specified item is over limit of 255 characters.
# This option is only used when HostInterface is not defined.
#
# Mandatory: no
# Default:
# HostInterfaceItem=
### Option: RefreshActiveChecks
# How often list of active checks is refreshed, in seconds.
#
# Mandatory: no
# Range: 60-3600
# Default:
# RefreshActiveChecks=120
### Option: BufferSend
# Do not keep data longer than N seconds in buffer.
#
# Mandatory: no
# Range: 1-3600
# Default:
# BufferSend=5
### Option: BufferSize
# Maximum number of values in a memory buffer. The agent will send
# all collected data to Zabbix Server or Proxy if the buffer is full.
# Option is not valid if EnablePersistentBuffer=1
#
# Mandatory: no
# Range: 2-65535
# Default:
# BufferSize=100
### Option: EnablePersistentBuffer
# Enable usage of local persistent storage for active items.
# 0 - disabled, in-memory buffer is used (default); 1 - use persistent buffer
# Mandatory: no
# Range: 0-1
# Default:
# EnablePersistentBuffer=0
### Option: PersistentBufferPeriod
# Zabbix Agent2 will keep data for this time period in case of no
# connectivity with Zabbix server or proxy. Older data will be lost. Log data will be preserved.
# Option is valid if EnablePersistentBuffer=1
#
# Mandatory: no
# Range: 1m-365d
# Default:
# PersistentBufferPeriod=1h
### Option: PersistentBufferFile
# Full filename. Zabbix Agent2 will keep SQLite database in this file.
# Option is valid if EnablePersistentBuffer=1
#
# Mandatory: no
# Default:
# PersistentBufferFile=
############ ADVANCED PARAMETERS #################
### Option: Alias
# Sets an alias for an item key. It can be used to substitute long and complex item key with a smaller and simpler one.
# Multiple Alias parameters may be present. Multiple parameters with the same Alias key are not allowed.
# Different Alias keys may reference the same item key.
# For example, to retrieve the ID of user 'zabbix':
# Alias=zabbix.userid:vfs.file.regexp[/etc/passwd,^zabbix:.:([0-9]+),,,,\1]
# Now shorthand key zabbix.userid may be used to retrieve data.
# Aliases can be used in HostMetadataItem but not in HostnameItem parameters.
#
# Mandatory: no
# Range:
# Default:
### Option: Timeout
# Spend no more than Timeout seconds on processing
#
# Mandatory: no
# Range: 1-30
# Default:
# Timeout=3
### Option: Include
# You may include individual files or all files in a directory in the configuration file.
# Installing Zabbix will create include directory in /usr/local/etc, unless modified during the compile time.
#
# Mandatory: no
# Default:
# Include=
Include=/etc/zabbix/zabbix_agent2.d/*.conf
# Include=/usr/local/etc/zabbix_agent2.userparams.conf
# Include=/usr/local/etc/zabbix_agent2.conf.d/
# Include=/usr/local/etc/zabbix_agent2.conf.d/*.conf
### Option:PluginTimeout
# Timeout for connections with external plugins.
#
# Mandatory: no
# Range: 1-30
# Default: <Global timeout>
# PluginTimeout=
### Option:PluginSocket
# Path to unix socket for external plugin communications.
#
# Mandatory: no
# Default:/tmp/agent.plugin.sock
# PluginSocket=
####### USER-DEFINED MONITORED PARAMETERS #######
### Option: UnsafeUserParameters
# Allow all characters to be passed in arguments to user-defined parameters.
# The following characters are not allowed:
# \ ' " ` * ? [ ] { } ~ $ ! & ; ( ) < > | # #
# Additionally, newline characters are not allowed.
# 0 - do not allow
# 1 - allow
#
# Mandatory: no
# Range: 0-1
# Default:
# UnsafeUserParameters=0
### Option: UserParameter
# User-defined parameter to monitor. There can be several user-defined parameters.
# Format: UserParameter=<key>,<shell command>
# See 'zabbix_agentd' directory for examples.
#
# Mandatory: no
# Default:
# UserParameter=
### Option: UserParameterDir
# Directory to execute UserParameter commands from. Only one entry is allowed.
# When executing UserParameter commands the agent will change the working directory to the one
# specified in the UserParameterDir option.
# This way UserParameter commands can be specified using the relative ./ prefix.
#
# Mandatory: no
# Default:
# UserParameterDir=
### Option: ControlSocket
# The control socket, used to send runtime commands with '-R' option.
#
# Mandatory: no
# Default:
# ControlSocket=
ControlSocket=/tmp/agent.sock
####### TLS-RELATED PARAMETERS #######
### Option: TLSConnect
# How the agent should connect to server or proxy. Used for active checks.
# Only one value can be specified:
# unencrypted - connect without encryption
# psk - connect using TLS and a pre-shared key
# cert - connect using TLS and a certificate
#
# Mandatory: yes, if TLS certificate or PSK parameters are defined (even for 'unencrypted' connection)
# Default:
# TLSConnect=unencrypted
### Option: TLSAccept
# What incoming connections to accept.
# Multiple values can be specified, separated by comma:
# unencrypted - accept connections without encryption
# psk - accept connections secured with TLS and a pre-shared key
# cert - accept connections secured with TLS and a certificate
#
# Mandatory: yes, if TLS certificate or PSK parameters are defined (even for 'unencrypted' connection)
# Default:
# TLSAccept=unencrypted
### Option: TLSCAFile
# Full pathname of a file containing the top-level CA(s) certificates for
# peer certificate verification.
#
# Mandatory: no
# Default:
# TLSCAFile=
### Option: TLSCRLFile
# Full pathname of a file containing revoked certificates.
#
# Mandatory: no
# Default:
# TLSCRLFile=
### Option: TLSServerCertIssuer
# Allowed server certificate issuer.
#
# Mandatory: no
# Default:
# TLSServerCertIssuer=
### Option: TLSServerCertSubject
# Allowed server certificate subject.
#
# Mandatory: no
# Default:
# TLSServerCertSubject=
### Option: TLSCertFile
# Full pathname of a file containing the agent certificate or certificate chain.
#
# Mandatory: no
# Default:
# TLSCertFile=
### Option: TLSKeyFile
# Full pathname of a file containing the agent private key.
#
# Mandatory: no
# Default:
# TLSKeyFile=
### Option: TLSPSKIdentity
# Unique, case sensitive string used to identify the pre-shared key.
#
# Mandatory: no
# Default:
# TLSPSKIdentity=
### Option: TLSPSKFile
# Full pathname of a file containing the pre-shared key.
#
# Mandatory: no
# Default:
# TLSPSKFile=
####### PLUGIN-SPECIFIC PARAMETERS #######
### Option: Plugins
# A plugin can have one or more plugin specific configuration parameters in format:
# Plugins.<PluginName>.<Parameter1>=<value1>
# Plugins.<PluginName>.<Parameter2>=<value2>
#
# Mandatory: no
# Range:
# Default:
### Option: Plugins.Log.MaxLinesPerSecond
# Maximum number of new lines the agent will send per second to Zabbix Server
# or Proxy processing 'log' and 'logrt' active checks.
# The provided value will be overridden by the parameter 'maxlines',
# provided in 'log' or 'logrt' item keys.
#
# Mandatory: no
# Range: 1-1000
# Default:
# Plugins.Log.MaxLinesPerSecond=20
### Option: AllowKey
# Allow execution of item keys matching pattern.
# Multiple keys matching rules may be defined in combination with DenyKey.
# Key pattern is wildcard expression, which support "*" character to match any number of any characters in certain position. It might be used in both key name and key arguments.
# Parameters are processed one by one according their appearance order.
# If no AllowKey or DenyKey rules defined, all keys are allowed.
#
# Mandatory: no
### Option: DenyKey
# Deny execution of items keys matching pattern.
# Multiple keys matching rules may be defined in combination with AllowKey.
# Key pattern is wildcard expression, which support "*" character to match any number of any characters in certain position. It might be used in both key name and key arguments.
# Parameters are processed one by one according their appearance order.
# If no AllowKey or DenyKey rules defined, all keys are allowed.
# Unless another system.run[*] rule is specified DenyKey=system.run[*] is added by default.
#
# Mandatory: no
# Default:
# DenyKey=system.run[*]
### Option: Plugins.SystemRun.LogRemoteCommands
# Enable logging of executed shell commands as warnings.
# 0 - disabled
# 1 - enabled
#
# Mandatory: no
# Default:
# Plugins.SystemRun.LogRemoteCommands=0
### Option: ForceActiveChecksOnStart
# Perform active checks immediately after restart for first received configuration.
# Also available as per plugin configuration, example: Plugins.Uptime.System.ForceActiveChecksOnStart=1
#
# Mandatory: no
# Range: 0-1
# Default:
# ForceActiveChecksOnStart=0
# Include configuration files for plugins
Include=./zabbix_agent2.d/plugins.d/*.conf
The answer is to put both backend and frontend ip adresses in the server= and serveractive= in the conf file
In my case :
Server=172.16.239.40, 172.16.238.2
ServerActive=172.16.239.40, 172.16.238.2

caffe, Train model with python layer got an error

I try to run the linreg example in /examples/pycaffe.
I added a solver.prototxt for it.
Then I try to train it, but I got an error.
Can you help me?
the solver.prototxt I added:
net: "${caffe_dir}/examples/pycaffe/linreg.prototxt"
display: 40
base_lr: 0.1
lr_policy: "step"
stepsize: 320000
gamma: 0.96
max_iter: 600000
momentum: 0.9
weight_decay: 0.0001
snapshot: 40000
snapshot_prefix: "test"
solver_mode: CPU
the error info is:
I0815 19:23:26.105624 7281 layer_factory.hpp:77] Creating layer loss
*** Aborted at 1502796212 (unix time) try "date -d #1502796212" if you are using GNU date ***
PC: # 0x7f6713ded458 (unknown)
*** SIGSEGV (#0x0) received by PID 7281 (TID 0x7f671559aa40) from PID 0; stack trace: ***
# 0x7f6712fd9cb0 (unknown)
# 0x7f6713ded458 (unknown)
# 0x7f66ce27bd40 google::protobuf::python::message_meta::AddFieldNumberToClass()
# 0x7f66ce27c1e6 google::protobuf::python::message_meta::New()
# 0x7f6713699953 (unknown)
# 0x7f67136970a3 (unknown)
# 0x7f67136be2d6 (unknown)
# 0x7f67136c2c3d (unknown)
# 0x7f67136c2f22 (unknown)
# 0x7f67136c377c (unknown)
# 0x7f6713726ef6 (unknown)
# 0x7f6713727609 (unknown)
# 0x7f671362747f (unknown)
# 0x7f6713720a46 (unknown)
# 0x7f67136f485f (unknown)
# 0x7f67136970a3 (unknown)
# 0x7f671372a5f7 (unknown)
# 0x7f67136be6d3 (unknown)
# 0x7f67136c2c3d (unknown)
# 0x7f67136c2f22 (unknown)
# 0x7f67136c377c (unknown)
# 0x7f6713726ef6 (unknown)
# 0x7f6713727609 (unknown)
# 0x7f671362747f (unknown)
# 0x7f6713720a46 (unknown)
# 0x7f67136f485f (unknown)
# 0x7f67136970a3 (unknown)
# 0x7f671372a5f7 (unknown)
# 0x7f67136be6d3 (unknown)
# 0x7f67136c2c3d (unknown)
# 0x7f67136c2f22 (unknown)
# 0x7f67136c377c (unknown)
Segmentation fault (core dumped)
this sentence has no sense. It's just a tail.
this sentence has no sense. It's just a tail.
this sentence has no sense. It's just a tail.
this sentence has no sense. It's just a tail.
this sentence has no sense. It's just a tail.
this sentence has no sense. It's just a tail.
solved !
That's because the python-protobuf's version is 3.2.0, but the protobuf installed by sudo apt-get install libprotobuf-dev is 2.5.0 version.
So, make the two have same version can solve this problem.

Multiple-GPU training not available in caffe

I run into problems when trying to use caffe with multiple gpus. When executing following command, I get the error log show below:
caffe train -solver $SOLVER -gpu 0,1 2>&1 | tee $LOGGING
F0409 14:17:22.355074 12079 caffe.cpp:254] Multi-GPU execution not available - rebuild with USE_NCCL
*** Check failure stack trace: ***
# 0x2aee66002b2d google::LogMessage::Fail()
# 0x2aee66004995 google::LogMessage::SendToLog()
# 0x2aee660026a9 google::LogMessage::Flush()
# 0x2aee6600542e google::LogMessageFatal::~LogMessageFatal()
# 0x40c172 train()
# 0x4084f3 main
# 0x2aee78f67b35 __libc_start_main
# 0x408f0b (unknown)
Can anyone explain what is wrong here? Is there some caffe bug which I am not aware of?
Install CUDA
Install cuDNN
Install Dependencies
$ sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler libgflags-dev libgoogle-glog-dev liblmdb-dev libatlas-base-dev git
$ sudo apt-get install --no-install-recommends libboost-all-dev
Install NCCL
NVIDIA NCCL is required to run Caffe on more than one GPU. NCCL can be installed with the following commands:
$ git clone https://github.com/NVIDIA/nccl.git
$ cd nccl
$ sudo make install -j
NCCL libraries and headers will be installed in /usr/local/lib and /usr/local/include.
Install Caffe
Uncomment the line USE_CUDNN := 1. This enables cuDNN acceleration.
Uncomment the line USE_NCCL := 1. This enables NCCL which is required to run Caffe on multiple GPUs.
Save and close the file. You're now ready to compile Caffe.
$ make all -j
When this command completes, the Caffe binary will be available at build/tools/caffe.

Caffe Installation Issues

I'm having some problem while installing Caffe. Please let me know if anyone have come across the same issue. Thanks.
make runtest
.build_release/test/test_all.testbin 0 --gtest_shuffle
Cuda number of devices: 1
Setting to use device 0
Current device id: 0
Note: Randomizing tests' orders with a seed of 88789 .
[==========] Running 838 tests from 169 test cases.
[----------] Global test environment set-up.
[----------] 3 tests from ImageDataLayerTest/3, where TypeParam = caffe::DoubleGPU
[ RUN ] ImageDataLayerTest/3.TestResize
F0107 14:26:04.664185 3079 math_functions.cpp:91] Check failed: error == cudaSuccess (11 vs. 0) invalid argument
* Check failure stack trace: *
# 0x2ab3f5243daa (unknown)
# 0x2ab3f5243ce4 (unknown)
# 0x2ab3f52436e6 (unknown)
# 0x2ab3f5246687 (unknown)
# 0x6bdc35 caffe::caffe_copy<>()
# 0x7439af caffe::BasePrefetchingDataLayer<>::Forward_gpu()
# 0x428da2 caffe::Layer<>::Forward()
# 0x62ff53 caffe::ImageDataLayerTest_TestResize_Test<>::TestBody()
# 0x657363 testing::internal::HandleExceptionsInMethodIfSupported<>()
# 0x64de07 testing::Test::Run()
# 0x64deae testing::TestInfo::Run()
# 0x64dfb5 testing::TestCase::Run()
# 0x6512f8 testing::internal::UnitTestImpl::RunAllTests()
# 0x651587 testing::UnitTest::Run()
# 0x41d3a0 main
# 0x2ab3f8396ec5 (unknown)
# 0x4243d7 (unknown)
# (nil) (unknown)
make: *** [runtest] Aborted (core dumped)
#
Ubuntu 14.04
/$ lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation G92GLM [Quadro FX 3800M] (rev a2)
/$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 340.29 Thu Jul 31 20:23:19 PDT 2014
GCC version: gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)
/$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2014 NVIDIA Corporation
Built on Thu_Jul_17_21:41:27_CDT_2014
Cuda compilation tools, release 6.5, V6.5.12
#
There seems to be a problem with the GPU support. Maybe it does not support your GPU. I would try installing Caffe without GPU support. All you need to do is to uncomment
CPU_ONLY := 1
in Makefile.config and then make again. Here are the instructions.