Cannot find mean file while training on Imagenet - caffe

I am trying to train and validate a network on Imagenet. The validation process works without any problems (with the pretrained weights). However, when I try to perform the training, there appears an error that the imagenet_mean.binaryproto file is not found; the very same file that has worked for the valiudation process. What is wrong?
...
I0222 15:29:15.108032 15823 net.cpp:399] data -> label
I0222 15:29:15.108057 15823 data_transformer.cpp:25] Loading mean file from: /home/myuser/learning/caffe/data/ilsvrc12/imagenet_mean.binaryproto
F0222 15:29:15.108577 15830 db_lmdb.hpp:14] Check failed: mdb_status == 0 (2 vs. 0) No such file or directory
*** Check failure stack trace: ***
# 0x7fc82857edaa (unknown)
# 0x7fc82857ece4 (unknown)
# 0x7fc82857e6e6 (unknown)
# 0x7fc828581687 (unknown)
# 0x7fc828ba115e caffe::db::LMDB::Open()
# 0x7fc828b75644 caffe::DataReader::Body::InternalThreadEntry()
# 0x7fc828cc1470 caffe::InternalThread::entry()
# 0x7fc81f4a8a4a (unknown)
# 0x7fc826a98184 start_thread
# 0x7fc8271b437d (unknown)
# (nil) (unknown)
Aborted (core dumped)
Here is the prototxt I am using:
name: "CaffeNet"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: true
crop_size: 227
mean_file: "/home/myuser/learning/caffe/data/ilsvrc12/imagenet_mean.binaryproto"
#mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
}
# mean pixel / channel-wise mean instead of mean image
# transform_param {
# crop_size: 227
# mean_value: 104
# mean_value: 117
# mean_value: 123
# mirror: true
# }
data_param {
source: "examples/imagenet/ilsvrc12_train_lmdb"
batch_size: 256
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: false
crop_size: 227
mean_file: "/home/myuser/learning/caffe/data/ilsvrc12/imagenet_mean.binaryproto"
#mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
}
# mean pixel / channel-wise mean instead of mean image
# transform_param {
# crop_size: 227
# mean_value: 104
# mean_value: 117
# mean_value: 123
# mirror: false
# }
data_param {
source: "/sdc/repository/myuser/Imagenet2012/Imagenet2012trainLMDB"
#source: "examples/imagenet/ilsvrc12_val_lmdb"
batch_size: 50
backend: LMDB
}
}
layer {
name: "conv1"
…

Related

Percona Mysql kubernetes operator deleting S3 backup files (.md5 and sst_info files) during namespace deletion

When I run the percona Mysql kubernetes operator everything works fine even the restoring of backups is fine at first.
The problem is : The backup files are sent to AWS S3 which contains two folders (full , sst_info) and a .md5 file
All these files are required in order to restore the database. But when the namespace in kubernetes cluster is deleted , the sst_info folder and the .md5 files are being deleted in s3 bucket as well. Which results in not being able to restore the backup afterwards.
click to view s3 screenshot of backup folders and file before namespace deletion
click to view s3 screenshot of backup folders and file after namespace deletion
Any help will be appreciated.
percona mysql operator from : https://github.com/percona/percona-xtradb-cluster-operator
my cr.yaml file:
apiVersion: pxc.percona.com/v1-11-0
kind: PerconaXtraDBCluster
metadata:
name: cluster1
finalizers:
- delete-pxc-pods-in-order
# - delete-proxysql-pvc
# - delete-pxc-pvc
# annotations:
# percona.com/issue-vault-token: "true"
spec:
crVersion: 1.11.0
# secretsName: my-cluster-secrets
# vaultSecretName: keyring-secret-vault
# sslSecretName: my-cluster-ssl
# sslInternalSecretName: my-cluster-ssl-internal
# logCollectorSecretName: my-log-collector-secrets
# initImage: percona/percona-xtradb-cluster-operator:1.11.0
# enableCRValidationWebhook: true
# tls:
# SANs:
# - pxc-1.example.com
# - pxc-2.example.com
# - pxc-3.example.com
# issuerConf:
# name: special-selfsigned-issuer
# kind: ClusterIssuer
# group: cert-manager.io
allowUnsafeConfigurations: false
# pause: false
updateStrategy: SmartUpdate
upgradeOptions:
versionServiceEndpoint: https://check.percona.com
apply: 8.0-recommended
schedule: "0 4 * * *"
pxc:
size: 3
image: percona/percona-xtradb-cluster:8.0.27-18.1
autoRecovery: true
# expose:
# enabled: true
# type: LoadBalancer
# trafficPolicy: Local
# loadBalancerSourceRanges:
# - 10.0.0.0/8
# annotations:
# networking.gke.io/load-balancer-type: "Internal"
# replicationChannels:
# - name: pxc1_to_pxc2
# isSource: true
# - name: pxc2_to_pxc1
# isSource: false
# configuration:
# sourceRetryCount: 3
# sourceConnectRetry: 60
# sourcesList:
# - host: 10.95.251.101
# port: 3306
# weight: 100
# schedulerName: mycustom-scheduler
# readinessDelaySec: 15
# livenessDelaySec: 600
# configuration: |
# [mysqld]
# wsrep_debug=CLIENT
# wsrep_provider_options="gcache.size=1G; gcache.recover=yes"
# [sst]
# xbstream-opts=--decompress
# [xtrabackup]
# compress=lz4
# for PXC 5.7
# [xtrabackup]
# compress
# imagePullSecrets:
# - name: private-registry-credentials
# priorityClassName: high-priority
# annotations:
# iam.amazonaws.com/role: role-arn
# labels:
# rack: rack-22
# readinessProbes:
# initialDelaySeconds: 15
# timeoutSeconds: 15
# periodSeconds: 30
# successThreshold: 1
# failureThreshold: 5
# livenessProbes:
# initialDelaySeconds: 300
# timeoutSeconds: 5
# periodSeconds: 10
# successThreshold: 1
# failureThreshold: 3
# containerSecurityContext:
# privileged: false
# podSecurityContext:
# runAsUser: 1001
# runAsGroup: 1001
# supplementalGroups: [1001]
# serviceAccountName: percona-xtradb-cluster-operator-workload
# imagePullPolicy: Always
# runtimeClassName: image-rc
# sidecars:
# - image: busybox
# command: ["/bin/sh"]
# args: ["-c", "while true; do trap 'exit 0' SIGINT SIGTERM SIGQUIT SIGKILL; done;"]
# name: my-sidecar-1
# resources:
# requests:
# #memory: 100M
# cpu: 100m
# limits:
# #memory: 200M
# cpu: 200m
# envVarsSecret: my-env-var-secrets
#resources:
#requests:
#memory: 1G
#cpu: 600m
# ephemeral-storage: 1G
# limits:
# #memory: 1G
# cpu: "1"
# ephemeral-storage: 1G
# nodeSelector:
# disktype: ssd
# affinity:
# antiAffinityTopologyKey: "kubernetes.io/hostname"
# antiAffinityTopologyKey: "kubernetes.io/hostname"
# advanced:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: kubernetes.io/e2e-az-name
# operator: In
# values:
# - e2e-az1
# - e2e-az2
# tolerations:
# - key: "node.alpha.kubernetes.io/unreachable"
# operator: "Exists"
# effect: "NoExecute"
# tolerationSeconds: 6000
podDisruptionBudget:
maxUnavailable: 1
# minAvailable: 0
volumeSpec:
# emptyDir: {}
# hostPath:
# path: /data
# type: Directory
persistentVolumeClaim:
# storageClassName: standard
# accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 6G
gracePeriod: 600
haproxy:
enabled: true
size: 3
image: percona/percona-xtradb-cluster-operator:1.11.0-haproxy
# replicasServiceEnabled: false
# imagePullPolicy: Always
# schedulerName: mycustom-scheduler
# readinessDelaySec: 15
# livenessDelaySec: 600
# configuration: |
#
# the actual default configuration file can be found here https://github.com/percona/percona-docker/blob/main/haproxy/dockerdir/etc/haproxy/haproxy-global.cfg
#
# global
# maxconn 2048
# external-check
# insecure-fork-wanted
# stats socket /etc/haproxy/pxc/haproxy.sock mode 600 expose-fd listeners level admin
#
# defaults
# default-server init-addr last,libc,none
# log global
# mode tcp
# retries 10
# timeout client 28800s
# timeout connect 100500
# timeout server 28800s
#
# frontend galera-in
# bind *:3309 accept-proxy
# bind *:3306
# mode tcp
# option clitcpka
# default_backend galera-nodes
#
# frontend galera-admin-in
# bind *:33062
# mode tcp
# option clitcpka
# default_backend galera-admin-nodes
#
# frontend galera-replica-in
# bind *:3307
# mode tcp
# option clitcpka
# default_backend galera-replica-nodes
#
# frontend galera-mysqlx-in
# bind *:33060
# mode tcp
# option clitcpka
# default_backend galera-mysqlx-nodes
#
# frontend stats
# bind *:8404
# mode http
# option http-use-htx
# http-request use-service prometheus-exporter if { path /metrics }
# imagePullSecrets:
# - name: private-registry-credentials
# annotations:
# iam.amazonaws.com/role: role-arn
# labels:
# rack: rack-22
# readinessProbes:
# initialDelaySeconds: 15
# timeoutSeconds: 1
# periodSeconds: 5
# successThreshold: 1
# failureThreshold: 3
# livenessProbes:
# initialDelaySeconds: 60
# timeoutSeconds: 5
# periodSeconds: 30
# successThreshold: 1
# failureThreshold: 4
# serviceType: ClusterIP
# externalTrafficPolicy: Cluster
# replicasServiceType: ClusterIP
# replicasExternalTrafficPolicy: Cluster
# runtimeClassName: image-rc
# sidecars:
# - image: busybox
# command: ["/bin/sh"]
# args: ["-c", "while true; do trap 'exit 0' SIGINT SIGTERM SIGQUIT SIGKILL; done;"]
# name: my-sidecar-1
# resources:
# requests:
# #memory: 100M
# cpu: 100m
# limits:
# #memory: 200M
# cpu: 200m
# envVarsSecret: my-env-var-secrets
#resources:
#requests:
#memory: 1G
#cpu: 600m
# limits:
# #memory: 1G
# cpu: 700m
# priorityClassName: high-priority
# nodeSelector:
# disktype: ssd
# sidecarResources:
# requests:
# #memory: 1G
# cpu: 500m
# limits:
# #memory: 2G
# cpu: 600m
# containerSecurityContext:
# privileged: false
# podSecurityContext:
# runAsUser: 1001
# runAsGroup: 1001
# supplementalGroups: [1001]
# serviceAccountName: percona-xtradb-cluster-operator-workload
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
# advanced:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: kubernetes.io/e2e-az-name
# operator: In
# values:
# - e2e-az1
# - e2e-az2
# tolerations:
# - key: "node.alpha.kubernetes.io/unreachable"
# operator: "Exists"
# effect: "NoExecute"
# tolerationSeconds: 6000
podDisruptionBudget:
maxUnavailable: 1
# minAvailable: 0
gracePeriod: 30
# loadBalancerSourceRanges:
# - 10.0.0.0/8
# serviceAnnotations:
# service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
# serviceLabels:
# rack: rack-23
proxysql:
enabled: false
size: 3
image: percona/percona-xtradb-cluster-operator:1.11.0-proxysql
# imagePullPolicy: Always
# configuration: |
# datadir="/var/lib/proxysql"
#
# admin_variables =
# {
# admin_credentials="proxyadmin:admin_password"
# mysql_ifaces="0.0.0.0:6032"
# refresh_interval=2000
#
# cluster_username="proxyadmin"
# cluster_password="admin_password"
# checksum_admin_variables=false
# checksum_ldap_variables=false
# checksum_mysql_variables=false
# cluster_check_interval_ms=200
# cluster_check_status_frequency=100
# cluster_mysql_query_rules_save_to_disk=true
# cluster_mysql_servers_save_to_disk=true
# cluster_mysql_users_save_to_disk=true
# cluster_proxysql_servers_save_to_disk=true
# cluster_mysql_query_rules_diffs_before_sync=1
# cluster_mysql_servers_diffs_before_sync=1
# cluster_mysql_users_diffs_before_sync=1
# cluster_proxysql_servers_diffs_before_sync=1
# }
#
# mysql_variables=
# {
# monitor_password="monitor"
# monitor_galera_healthcheck_interval=1000
# threads=2
# max_connections=2048
# default_query_delay=0
# default_query_timeout=10000
# poll_timeout=2000
# interfaces="0.0.0.0:3306"
# default_schema="information_schema"
# stacksize=1048576
# connect_timeout_server=10000
# monitor_history=60000
# monitor_connect_interval=20000
# monitor_ping_interval=10000
# ping_timeout_server=200
# commands_stats=true
# sessions_sort=true
# have_ssl=true
# ssl_p2s_ca="/etc/proxysql/ssl-internal/ca.crt"
# ssl_p2s_cert="/etc/proxysql/ssl-internal/tls.crt"
# ssl_p2s_key="/etc/proxysql/ssl-internal/tls.key"
# ssl_p2s_cipher="ECDHE-RSA-AES128-GCM-SHA256"
# }
# readinessDelaySec: 15
# livenessDelaySec: 600
# schedulerName: mycustom-scheduler
# imagePullSecrets:
# - name: private-registry-credentials
# annotations:
# iam.amazonaws.com/role: role-arn
# labels:
# rack: rack-22
# serviceType: ClusterIP
# externalTrafficPolicy: Cluster
# runtimeClassName: image-rc
# sidecars:
# - image: busybox
# command: ["/bin/sh"]
# args: ["-c", "while true; do trap 'exit 0' SIGINT SIGTERM SIGQUIT SIGKILL; done;"]
# name: my-sidecar-1
# resources:
# requests:
# #memory: 100M
# cpu: 100m
# limits:
# #memory: 200M
# cpu: 200m
# envVarsSecret: my-env-var-secrets
#resources:
#requests:
#memory: 1G
#cpu: 600m
# limits:
# #memory: 1G
# cpu: 700m
# priorityClassName: high-priority
# nodeSelector:
# disktype: ssd
# sidecarResources:
# requests:
# #memory: 1G
# cpu: 500m
# limits:
# #memory: 2G
# cpu: 600m
# containerSecurityContext:
# privileged: false
# podSecurityContext:
# runAsUser: 1001
# runAsGroup: 1001
# supplementalGroups: [1001]
# serviceAccountName: percona-xtradb-cluster-operator-workload
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
# advanced:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: kubernetes.io/e2e-az-name
# operator: In
# values:
# - e2e-az1
# - e2e-az2
# tolerations:
# - key: "node.alpha.kubernetes.io/unreachable"
# operator: "Exists"
# effect: "NoExecute"
# tolerationSeconds: 6000
volumeSpec:
# emptyDir: {}
# hostPath:
# path: /data
# type: Directory
persistentVolumeClaim:
# storageClassName: standard
# accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 2G
podDisruptionBudget:
maxUnavailable: 1
# minAvailable: 0
gracePeriod: 30
# loadBalancerSourceRanges:
# - 10.0.0.0/8
# serviceAnnotations:
# service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
# serviceLabels:
# rack: rack-23
logcollector:
enabled: true
image: percona/percona-xtradb-cluster-operator:1.11.0-logcollector
# configuration: |
# [OUTPUT]
# Name es
# Match *
# Host 192.168.2.3
# Port 9200
# Index my_index
# Type my_type
#resources:
#requests:
#memory: 100M
#cpu: 200m
pmm:
enabled: false
image: percona/pmm-client:2.28.0
serverHost: monitoring-service
# serverUser: admin
# pxcParams: "--disable-tablestats-limit=2000"
# proxysqlParams: "--custom-labels=CUSTOM-LABELS"
#resources:
#requests:
#memory: 150M
#cpu: 300m
backup:
image: percona/percona-xtradb-cluster-operator:1.11.0-pxc8.0-backup
backoffLimit: 1
# serviceAccountName: percona-xtradb-cluster-operator
# imagePullSecrets:
# - name: private-registry-credentials
pitr:
enabled: true
storageName: s3-us-west-binlog
timeBetweenUploads: 60
# resources:
# requests:
# #memory: 0.1G
# cpu: 100m
# limits:
# #memory: 1G
# cpu: 700m
storages:
s3-us-west:
type: s3
verifyTLS: true
# nodeSelector:
# storage: tape
# backupWorker: 'True'
# resources:
# requests:
# #memory: 1G
# cpu: 600m
# affinity:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: backupWorker
# operator: In
# values:
# - 'True'
# tolerations:
# - key: "backupWorker"
# operator: "Equal"
# value: "True"
# effect: "NoSchedule"
# annotations:
# testName: scheduled-backup
# labels:
# backupWorker: 'True'
# schedulerName: 'default-scheduler'
# priorityClassName: 'high-priority'
# containerSecurityContext:
# privileged: true
# podSecurityContext:
# fsGroup: 1001
# supplementalGroups: [1001, 1002, 1003]
s3:
bucket: abc-percona-mysql-test
credentialsSecret: my-cluster-name-backup-s3
region: us-west-1
s3-us-west-binlog:
type: s3
verifyTLS: true
s3:
bucket: abc-percona-mysql-test-binlogs
credentialsSecret: my-cluster-name-backup-s3
region: us-west-1
fs-pvc:
type: filesystem
# nodeSelector:
# storage: tape
# backupWorker: 'True'
# resources:
# requests:
# #memory: 1G
# cpu: 600m
# affinity:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: backupWorker
# operator: In
# values:
# - 'True'
# tolerations:
# - key: "backupWorker"
# operator: "Equal"
# value: "True"
# effect: "NoSchedule"
# annotations:
# testName: scheduled-backup
# labels:
# backupWorker: 'True'
# schedulerName: 'default-scheduler'
# priorityClassName: 'high-priority'
# containerSecurityContext:
# privileged: true
# podSecurityContext:
# fsGroup: 1001
# supplementalGroups: [1001, 1002, 1003]
volume:
persistentVolumeClaim:
# storageClassName: standard
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 6G
schedule:
- name: "daily-backup-s3"
schedule: "0/5 * * * *"
keep: 10
storageName: s3-us-west
# - name: "daily-backup"
# schedule: "0 0 * * *"
# keep: 5
# storageName: fs-pvc
my restore.yaml file:
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
name: restore1
spec:
pxcCluster: cluster1
#backupName: backup1
#backupName: cron-cluster1-s3-us-west-20226305240-scrvc
backupSource: #
destination: s3://osos-percona-mysql-test/cluster1-2022-07-05-11:06:00-full
s3:
bucket: osos-percona-mysql-test
credentialsSecret: my-cluster-name-backup-s3
region: us-west-1
pitr:
type: latest
date: "2022-07-05 11:10:30"
backupSource: #
s3:
bucket: osos-percona-mysql-test-binlogs
credentialsSecret: my-cluster-name-backup-s3
region: us-west-1
my operator.yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: percona-xtradb-cluster-operator
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: operator
app.kubernetes.io/instance: percona-xtradb-cluster-operator
app.kubernetes.io/name: percona-xtradb-cluster-operator
app.kubernetes.io/part-of: percona-xtradb-cluster-operator
strategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
app.kubernetes.io/component: operator
app.kubernetes.io/instance: percona-xtradb-cluster-operator
app.kubernetes.io/name: percona-xtradb-cluster-operator
app.kubernetes.io/part-of: percona-xtradb-cluster-operator
spec:
terminationGracePeriodSeconds: 600
containers:
- command:
- percona-xtradb-cluster-operator
env:
- name: WATCH_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: OPERATOR_NAME
value: percona-xtradb-cluster-operator
image: percona/percona-xtradb-cluster-operator:1.11.0
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /metrics
port: metrics
scheme: HTTP
resources:
limits:
cpu: 200m
memory: 500Mi
requests:
cpu: 100m
memory: 20Mi
name: percona-xtradb-cluster-operator
ports:
- containerPort: 8080
name: metrics
protocol: TCP
serviceAccountName: percona-xtradb-cluster-operator
Why are you deleting a namespace? What is the action you're trying to achieve? Are you trying to delete an existing cluster created by the operator?
In that case, I will recommend you use the kubectl delete command to delete your cluster like:
kubectl delete perconaxtradbcluster/cluster1
This will ensure, by design that all the related objects get cleaned up. There is no need to attempt to delete the namespace, since this deletion operation can be invisible to the operator and hence the issue you're seeing.
That is the point of using operators to interact with Kubernetes, there is no need for manual action, the operator should provide maintenance utilities out of the box through manipulation of the operator objects :)

problems of training RCF model using caffe

I'm using caffe to train a model. I am sure I have connected the data layer with the train.txt in the source of image_data_param.But when I try ./train.sh. It always prompt: can not find image.
Ubuntu 18.04 openCV3 python2
layer {
name: "data"
type: "ImageLabelmapData"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: false
mean_value: 104.00699
mean_value: 116.66877
mean_value: 122.67892
}
image_data_param {
root_folder: "/home/yogang/Desktop/rawdata/train/"
source: "/home/yogang/Desktop/rawdata/train.txt"
batch_size: 1
shuffle: true
new_height:0
new_width: 0
}
}
I0828 19:29:13.834946 14079 layer_factory.hpp:77] Creating layer data
I0828 19:29:13.835011 14079 net.cpp:101] Creating Layer data
I0828 19:29:13.835031 14079 net.cpp:409] data -> data
I0828 19:29:13.835059 14079 net.cpp:409] data -> label
I0828 19:29:13.835124 14079 image_labelmap_data_layer.cpp:42] Opening file /home/yogang/Desktop/rawdata/train.txt
I0828 19:29:13.835505 14079 image_labelmap_data_layer.cpp:52] Shuffling data
I0828 19:29:13.835677 14079 image_labelmap_data_layer.cpp:57] A total of 242 images.
E0828 19:29:13.836748 14079 io.cpp:80] Could not open or find file /home/yogang/Desktop/rawdata/train//home/yogang/Desktop/rawdata/train/satellite144.jpg
E0828 19:29:13.836797 14079 io.cpp:80] Could not open or find file /home/yogang/Desktop/rawdata/train//home/yogang/Desktop/rawdata/train/400.jpg
F0828 19:29:13.836818 14079 image_labelmap_data_layer.cpp:86] Check failed: cv_img.data Could not load /home/yogang/Desktop/rawdata/train/satellite144.jpg
*** Check failure stack trace: ***
./train.sh: line 8: 14079 Aborted (core dumped) ./solve.py
i think image path is wrong
check your image path

Unknown blob input data to layer 0 in caffe

I am getting following error while using my caffe prototxt:
F0329 17:37:40.771555 24587 insert_splits.cpp:35] Unknown blob input data to layer 0
*** Check failure stack trace: ***
The first 2 layers in my caffe prototxt is given below:
layers {
name: "data"
type: IMAGE_DATA
top: "data"
top: "label"
include {
phase: TRAIN
}
image_data_param {
source: "train2.txt"
batch_size: 100
new_height: 28
new_width: 28
is_color: false
}
}
layers {
name: "conv1"
type: CONVOLUTION
bottom: "data"
top: "conv1"
blobs_lr: 1
blobs_lr: 3
convolution_param {
num_output: 8
kernel_size: 9
stride: 1
weight_filler { type: "xavier" }
bias_filler { type: "constant" }
}
}
What could be the possible reason for the same ?
t seems like your IMAGE_DATA layer is only defined for TRAIN phase. Thus blobs data and label are not defined for TEST phase. I suspect you see no error when the solver builds the train phase net, and only when test phase net is built then the error appears.

Caffe: change data layer after surgery

I trained an FC network with HDF5 data layer, then used surgery for transplantation to a convolutional network, then changed the data layer to a probe-suitable data layer, i.e.:
from:
layer {
name: "layer_data_left"
type: "HDF5Data"
top: "data_left"
top: "labels_left"
include {
phase: TRAIN
}
hdf5_data_param {
source: "/home/me/Desktop/trainLeftPatches.txt"
batch_size: 128
}
}
to
layer {
name: "data_left"
type: "Input"
top: "data_right"
input_param { shape: { dim: 1 dim: 1 dim: 1241 dim: 367 } }
}
is there any reason this would go out of memory?:
>>> fc_net.forward()
F0729 20:02:02.205382 6821 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
Aborted (core dumped)
Or, is it more likely that I made a mistake somewhere in surgery & exchanging data layers?
Thank you.

How do I specify num_test_nets in restart?

I trained a GoogleNet model for a while, and now I'd like to restart from a checkpoint, adding a test phase. I have the test already in my train_val.prototxt file, and I added the proper parameters to my solver.prototxt ... but I get an error on the restart:
I0712 15:53:02.615947 47646 net.cpp:278] This network produces output loss2/loss1
I0712 15:53:02.615964 47646 net.cpp:278] This network produces output loss3/loss3
I0712 15:53:02.616109 47646 net.cpp:292] Network initialization done.
F0712 15:53:02.616665 47646 solver.cpp:128] Check failed: param_.test_iter_size() == num_test_nets (1 vs. 0) test_iter must be specified for each test network.
*** Check failure stack trace: ***
# 0x7f550cf70e6d (unknown)
# 0x7f550cf72ced (unknown)
# 0x7f550cf70a5c (unknown)
# 0x7f550cf7363e (unknown)
# 0x7f550d3b605b caffe::Solver<>::InitTestNets()
# 0x7f550d3b63ed caffe::Solver<>::Init()
# 0x7f550d3b6738 caffe::Solver<>::Solver()
# 0x7f550d4fa633 caffe::Creator_SGDSolver<>()
# 0x7f550da5bb76 caffe::SolverRegistry<>::CreateSolver()
# 0x7f550da548f4 train()
# 0x7f550da52316 main
# 0x7f5508f43b15 __libc_start_main
# 0x7f550da52d3d (unknown)
solver.prototxt
train_net: "<my_path>/train_val.prototxt"
test_iter: 1000
test_interval: 4000
test_initialization: false
display: 40
average_loss: 40
base_lr: 0.01
lr_policy: "step"
stepsize: 320000
gamma: 0.96
max_iter: 10000000
momentum: 0.9
weight_decay: 0.0002
snapshot: 40000
snapshot_prefix: "models/<my_path>"
solver_mode: CPU
train_val.prototxt train and test layers:
name: "GoogleNet"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: true
crop_size: 224
mean_value: 104
mean_value: 117
mean_value: 123
}
data_param {
source: "/<blah>/ilsvrc12_train_lmdb"
batch_size: 32
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: true
crop_size: 224
mean_value: 104
mean_value: 117
mean_value: 123
}
data_param {
source: "/<blah>/ilsvrc12_val_lmdb"
batch_size: 32
backend: LMDB
}
}
You should modify one place in your solver.prototxt from
train_net: "/train_val.prototxt"
to
net: "/train_val.prototxt"
Because the Solver does not use value of "train_net" to initialize a test net, so the test phase you added was not founded by the solver.
In fact, the parameters "train_net" and "test_net" are separately used to initialize a train net and a test net only, while "net" is used for both.