When I run the percona Mysql kubernetes operator everything works fine even the restoring of backups is fine at first.
The problem is : The backup files are sent to AWS S3 which contains two folders (full , sst_info) and a .md5 file
All these files are required in order to restore the database. But when the namespace in kubernetes cluster is deleted , the sst_info folder and the .md5 files are being deleted in s3 bucket as well. Which results in not being able to restore the backup afterwards.
click to view s3 screenshot of backup folders and file before namespace deletion
click to view s3 screenshot of backup folders and file after namespace deletion
Any help will be appreciated.
percona mysql operator from : https://github.com/percona/percona-xtradb-cluster-operator
my cr.yaml file:
apiVersion: pxc.percona.com/v1-11-0
kind: PerconaXtraDBCluster
metadata:
name: cluster1
finalizers:
- delete-pxc-pods-in-order
# - delete-proxysql-pvc
# - delete-pxc-pvc
# annotations:
# percona.com/issue-vault-token: "true"
spec:
crVersion: 1.11.0
# secretsName: my-cluster-secrets
# vaultSecretName: keyring-secret-vault
# sslSecretName: my-cluster-ssl
# sslInternalSecretName: my-cluster-ssl-internal
# logCollectorSecretName: my-log-collector-secrets
# initImage: percona/percona-xtradb-cluster-operator:1.11.0
# enableCRValidationWebhook: true
# tls:
# SANs:
# - pxc-1.example.com
# - pxc-2.example.com
# - pxc-3.example.com
# issuerConf:
# name: special-selfsigned-issuer
# kind: ClusterIssuer
# group: cert-manager.io
allowUnsafeConfigurations: false
# pause: false
updateStrategy: SmartUpdate
upgradeOptions:
versionServiceEndpoint: https://check.percona.com
apply: 8.0-recommended
schedule: "0 4 * * *"
pxc:
size: 3
image: percona/percona-xtradb-cluster:8.0.27-18.1
autoRecovery: true
# expose:
# enabled: true
# type: LoadBalancer
# trafficPolicy: Local
# loadBalancerSourceRanges:
# - 10.0.0.0/8
# annotations:
# networking.gke.io/load-balancer-type: "Internal"
# replicationChannels:
# - name: pxc1_to_pxc2
# isSource: true
# - name: pxc2_to_pxc1
# isSource: false
# configuration:
# sourceRetryCount: 3
# sourceConnectRetry: 60
# sourcesList:
# - host: 10.95.251.101
# port: 3306
# weight: 100
# schedulerName: mycustom-scheduler
# readinessDelaySec: 15
# livenessDelaySec: 600
# configuration: |
# [mysqld]
# wsrep_debug=CLIENT
# wsrep_provider_options="gcache.size=1G; gcache.recover=yes"
# [sst]
# xbstream-opts=--decompress
# [xtrabackup]
# compress=lz4
# for PXC 5.7
# [xtrabackup]
# compress
# imagePullSecrets:
# - name: private-registry-credentials
# priorityClassName: high-priority
# annotations:
# iam.amazonaws.com/role: role-arn
# labels:
# rack: rack-22
# readinessProbes:
# initialDelaySeconds: 15
# timeoutSeconds: 15
# periodSeconds: 30
# successThreshold: 1
# failureThreshold: 5
# livenessProbes:
# initialDelaySeconds: 300
# timeoutSeconds: 5
# periodSeconds: 10
# successThreshold: 1
# failureThreshold: 3
# containerSecurityContext:
# privileged: false
# podSecurityContext:
# runAsUser: 1001
# runAsGroup: 1001
# supplementalGroups: [1001]
# serviceAccountName: percona-xtradb-cluster-operator-workload
# imagePullPolicy: Always
# runtimeClassName: image-rc
# sidecars:
# - image: busybox
# command: ["/bin/sh"]
# args: ["-c", "while true; do trap 'exit 0' SIGINT SIGTERM SIGQUIT SIGKILL; done;"]
# name: my-sidecar-1
# resources:
# requests:
# #memory: 100M
# cpu: 100m
# limits:
# #memory: 200M
# cpu: 200m
# envVarsSecret: my-env-var-secrets
#resources:
#requests:
#memory: 1G
#cpu: 600m
# ephemeral-storage: 1G
# limits:
# #memory: 1G
# cpu: "1"
# ephemeral-storage: 1G
# nodeSelector:
# disktype: ssd
# affinity:
# antiAffinityTopologyKey: "kubernetes.io/hostname"
# antiAffinityTopologyKey: "kubernetes.io/hostname"
# advanced:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: kubernetes.io/e2e-az-name
# operator: In
# values:
# - e2e-az1
# - e2e-az2
# tolerations:
# - key: "node.alpha.kubernetes.io/unreachable"
# operator: "Exists"
# effect: "NoExecute"
# tolerationSeconds: 6000
podDisruptionBudget:
maxUnavailable: 1
# minAvailable: 0
volumeSpec:
# emptyDir: {}
# hostPath:
# path: /data
# type: Directory
persistentVolumeClaim:
# storageClassName: standard
# accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 6G
gracePeriod: 600
haproxy:
enabled: true
size: 3
image: percona/percona-xtradb-cluster-operator:1.11.0-haproxy
# replicasServiceEnabled: false
# imagePullPolicy: Always
# schedulerName: mycustom-scheduler
# readinessDelaySec: 15
# livenessDelaySec: 600
# configuration: |
#
# the actual default configuration file can be found here https://github.com/percona/percona-docker/blob/main/haproxy/dockerdir/etc/haproxy/haproxy-global.cfg
#
# global
# maxconn 2048
# external-check
# insecure-fork-wanted
# stats socket /etc/haproxy/pxc/haproxy.sock mode 600 expose-fd listeners level admin
#
# defaults
# default-server init-addr last,libc,none
# log global
# mode tcp
# retries 10
# timeout client 28800s
# timeout connect 100500
# timeout server 28800s
#
# frontend galera-in
# bind *:3309 accept-proxy
# bind *:3306
# mode tcp
# option clitcpka
# default_backend galera-nodes
#
# frontend galera-admin-in
# bind *:33062
# mode tcp
# option clitcpka
# default_backend galera-admin-nodes
#
# frontend galera-replica-in
# bind *:3307
# mode tcp
# option clitcpka
# default_backend galera-replica-nodes
#
# frontend galera-mysqlx-in
# bind *:33060
# mode tcp
# option clitcpka
# default_backend galera-mysqlx-nodes
#
# frontend stats
# bind *:8404
# mode http
# option http-use-htx
# http-request use-service prometheus-exporter if { path /metrics }
# imagePullSecrets:
# - name: private-registry-credentials
# annotations:
# iam.amazonaws.com/role: role-arn
# labels:
# rack: rack-22
# readinessProbes:
# initialDelaySeconds: 15
# timeoutSeconds: 1
# periodSeconds: 5
# successThreshold: 1
# failureThreshold: 3
# livenessProbes:
# initialDelaySeconds: 60
# timeoutSeconds: 5
# periodSeconds: 30
# successThreshold: 1
# failureThreshold: 4
# serviceType: ClusterIP
# externalTrafficPolicy: Cluster
# replicasServiceType: ClusterIP
# replicasExternalTrafficPolicy: Cluster
# runtimeClassName: image-rc
# sidecars:
# - image: busybox
# command: ["/bin/sh"]
# args: ["-c", "while true; do trap 'exit 0' SIGINT SIGTERM SIGQUIT SIGKILL; done;"]
# name: my-sidecar-1
# resources:
# requests:
# #memory: 100M
# cpu: 100m
# limits:
# #memory: 200M
# cpu: 200m
# envVarsSecret: my-env-var-secrets
#resources:
#requests:
#memory: 1G
#cpu: 600m
# limits:
# #memory: 1G
# cpu: 700m
# priorityClassName: high-priority
# nodeSelector:
# disktype: ssd
# sidecarResources:
# requests:
# #memory: 1G
# cpu: 500m
# limits:
# #memory: 2G
# cpu: 600m
# containerSecurityContext:
# privileged: false
# podSecurityContext:
# runAsUser: 1001
# runAsGroup: 1001
# supplementalGroups: [1001]
# serviceAccountName: percona-xtradb-cluster-operator-workload
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
# advanced:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: kubernetes.io/e2e-az-name
# operator: In
# values:
# - e2e-az1
# - e2e-az2
# tolerations:
# - key: "node.alpha.kubernetes.io/unreachable"
# operator: "Exists"
# effect: "NoExecute"
# tolerationSeconds: 6000
podDisruptionBudget:
maxUnavailable: 1
# minAvailable: 0
gracePeriod: 30
# loadBalancerSourceRanges:
# - 10.0.0.0/8
# serviceAnnotations:
# service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
# serviceLabels:
# rack: rack-23
proxysql:
enabled: false
size: 3
image: percona/percona-xtradb-cluster-operator:1.11.0-proxysql
# imagePullPolicy: Always
# configuration: |
# datadir="/var/lib/proxysql"
#
# admin_variables =
# {
# admin_credentials="proxyadmin:admin_password"
# mysql_ifaces="0.0.0.0:6032"
# refresh_interval=2000
#
# cluster_username="proxyadmin"
# cluster_password="admin_password"
# checksum_admin_variables=false
# checksum_ldap_variables=false
# checksum_mysql_variables=false
# cluster_check_interval_ms=200
# cluster_check_status_frequency=100
# cluster_mysql_query_rules_save_to_disk=true
# cluster_mysql_servers_save_to_disk=true
# cluster_mysql_users_save_to_disk=true
# cluster_proxysql_servers_save_to_disk=true
# cluster_mysql_query_rules_diffs_before_sync=1
# cluster_mysql_servers_diffs_before_sync=1
# cluster_mysql_users_diffs_before_sync=1
# cluster_proxysql_servers_diffs_before_sync=1
# }
#
# mysql_variables=
# {
# monitor_password="monitor"
# monitor_galera_healthcheck_interval=1000
# threads=2
# max_connections=2048
# default_query_delay=0
# default_query_timeout=10000
# poll_timeout=2000
# interfaces="0.0.0.0:3306"
# default_schema="information_schema"
# stacksize=1048576
# connect_timeout_server=10000
# monitor_history=60000
# monitor_connect_interval=20000
# monitor_ping_interval=10000
# ping_timeout_server=200
# commands_stats=true
# sessions_sort=true
# have_ssl=true
# ssl_p2s_ca="/etc/proxysql/ssl-internal/ca.crt"
# ssl_p2s_cert="/etc/proxysql/ssl-internal/tls.crt"
# ssl_p2s_key="/etc/proxysql/ssl-internal/tls.key"
# ssl_p2s_cipher="ECDHE-RSA-AES128-GCM-SHA256"
# }
# readinessDelaySec: 15
# livenessDelaySec: 600
# schedulerName: mycustom-scheduler
# imagePullSecrets:
# - name: private-registry-credentials
# annotations:
# iam.amazonaws.com/role: role-arn
# labels:
# rack: rack-22
# serviceType: ClusterIP
# externalTrafficPolicy: Cluster
# runtimeClassName: image-rc
# sidecars:
# - image: busybox
# command: ["/bin/sh"]
# args: ["-c", "while true; do trap 'exit 0' SIGINT SIGTERM SIGQUIT SIGKILL; done;"]
# name: my-sidecar-1
# resources:
# requests:
# #memory: 100M
# cpu: 100m
# limits:
# #memory: 200M
# cpu: 200m
# envVarsSecret: my-env-var-secrets
#resources:
#requests:
#memory: 1G
#cpu: 600m
# limits:
# #memory: 1G
# cpu: 700m
# priorityClassName: high-priority
# nodeSelector:
# disktype: ssd
# sidecarResources:
# requests:
# #memory: 1G
# cpu: 500m
# limits:
# #memory: 2G
# cpu: 600m
# containerSecurityContext:
# privileged: false
# podSecurityContext:
# runAsUser: 1001
# runAsGroup: 1001
# supplementalGroups: [1001]
# serviceAccountName: percona-xtradb-cluster-operator-workload
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
# advanced:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: kubernetes.io/e2e-az-name
# operator: In
# values:
# - e2e-az1
# - e2e-az2
# tolerations:
# - key: "node.alpha.kubernetes.io/unreachable"
# operator: "Exists"
# effect: "NoExecute"
# tolerationSeconds: 6000
volumeSpec:
# emptyDir: {}
# hostPath:
# path: /data
# type: Directory
persistentVolumeClaim:
# storageClassName: standard
# accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 2G
podDisruptionBudget:
maxUnavailable: 1
# minAvailable: 0
gracePeriod: 30
# loadBalancerSourceRanges:
# - 10.0.0.0/8
# serviceAnnotations:
# service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
# serviceLabels:
# rack: rack-23
logcollector:
enabled: true
image: percona/percona-xtradb-cluster-operator:1.11.0-logcollector
# configuration: |
# [OUTPUT]
# Name es
# Match *
# Host 192.168.2.3
# Port 9200
# Index my_index
# Type my_type
#resources:
#requests:
#memory: 100M
#cpu: 200m
pmm:
enabled: false
image: percona/pmm-client:2.28.0
serverHost: monitoring-service
# serverUser: admin
# pxcParams: "--disable-tablestats-limit=2000"
# proxysqlParams: "--custom-labels=CUSTOM-LABELS"
#resources:
#requests:
#memory: 150M
#cpu: 300m
backup:
image: percona/percona-xtradb-cluster-operator:1.11.0-pxc8.0-backup
backoffLimit: 1
# serviceAccountName: percona-xtradb-cluster-operator
# imagePullSecrets:
# - name: private-registry-credentials
pitr:
enabled: true
storageName: s3-us-west-binlog
timeBetweenUploads: 60
# resources:
# requests:
# #memory: 0.1G
# cpu: 100m
# limits:
# #memory: 1G
# cpu: 700m
storages:
s3-us-west:
type: s3
verifyTLS: true
# nodeSelector:
# storage: tape
# backupWorker: 'True'
# resources:
# requests:
# #memory: 1G
# cpu: 600m
# affinity:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: backupWorker
# operator: In
# values:
# - 'True'
# tolerations:
# - key: "backupWorker"
# operator: "Equal"
# value: "True"
# effect: "NoSchedule"
# annotations:
# testName: scheduled-backup
# labels:
# backupWorker: 'True'
# schedulerName: 'default-scheduler'
# priorityClassName: 'high-priority'
# containerSecurityContext:
# privileged: true
# podSecurityContext:
# fsGroup: 1001
# supplementalGroups: [1001, 1002, 1003]
s3:
bucket: abc-percona-mysql-test
credentialsSecret: my-cluster-name-backup-s3
region: us-west-1
s3-us-west-binlog:
type: s3
verifyTLS: true
s3:
bucket: abc-percona-mysql-test-binlogs
credentialsSecret: my-cluster-name-backup-s3
region: us-west-1
fs-pvc:
type: filesystem
# nodeSelector:
# storage: tape
# backupWorker: 'True'
# resources:
# requests:
# #memory: 1G
# cpu: 600m
# affinity:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: backupWorker
# operator: In
# values:
# - 'True'
# tolerations:
# - key: "backupWorker"
# operator: "Equal"
# value: "True"
# effect: "NoSchedule"
# annotations:
# testName: scheduled-backup
# labels:
# backupWorker: 'True'
# schedulerName: 'default-scheduler'
# priorityClassName: 'high-priority'
# containerSecurityContext:
# privileged: true
# podSecurityContext:
# fsGroup: 1001
# supplementalGroups: [1001, 1002, 1003]
volume:
persistentVolumeClaim:
# storageClassName: standard
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 6G
schedule:
- name: "daily-backup-s3"
schedule: "0/5 * * * *"
keep: 10
storageName: s3-us-west
# - name: "daily-backup"
# schedule: "0 0 * * *"
# keep: 5
# storageName: fs-pvc
my restore.yaml file:
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
name: restore1
spec:
pxcCluster: cluster1
#backupName: backup1
#backupName: cron-cluster1-s3-us-west-20226305240-scrvc
backupSource: #
destination: s3://osos-percona-mysql-test/cluster1-2022-07-05-11:06:00-full
s3:
bucket: osos-percona-mysql-test
credentialsSecret: my-cluster-name-backup-s3
region: us-west-1
pitr:
type: latest
date: "2022-07-05 11:10:30"
backupSource: #
s3:
bucket: osos-percona-mysql-test-binlogs
credentialsSecret: my-cluster-name-backup-s3
region: us-west-1
my operator.yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: percona-xtradb-cluster-operator
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: operator
app.kubernetes.io/instance: percona-xtradb-cluster-operator
app.kubernetes.io/name: percona-xtradb-cluster-operator
app.kubernetes.io/part-of: percona-xtradb-cluster-operator
strategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
app.kubernetes.io/component: operator
app.kubernetes.io/instance: percona-xtradb-cluster-operator
app.kubernetes.io/name: percona-xtradb-cluster-operator
app.kubernetes.io/part-of: percona-xtradb-cluster-operator
spec:
terminationGracePeriodSeconds: 600
containers:
- command:
- percona-xtradb-cluster-operator
env:
- name: WATCH_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: OPERATOR_NAME
value: percona-xtradb-cluster-operator
image: percona/percona-xtradb-cluster-operator:1.11.0
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /metrics
port: metrics
scheme: HTTP
resources:
limits:
cpu: 200m
memory: 500Mi
requests:
cpu: 100m
memory: 20Mi
name: percona-xtradb-cluster-operator
ports:
- containerPort: 8080
name: metrics
protocol: TCP
serviceAccountName: percona-xtradb-cluster-operator
Why are you deleting a namespace? What is the action you're trying to achieve? Are you trying to delete an existing cluster created by the operator?
In that case, I will recommend you use the kubectl delete command to delete your cluster like:
kubectl delete perconaxtradbcluster/cluster1
This will ensure, by design that all the related objects get cleaned up. There is no need to attempt to delete the namespace, since this deletion operation can be invisible to the operator and hence the issue you're seeing.
That is the point of using operators to interact with Kubernetes, there is no need for manual action, the operator should provide maintenance utilities out of the box through manipulation of the operator objects :)
I am trying to train and validate a network on Imagenet. The validation process works without any problems (with the pretrained weights). However, when I try to perform the training, there appears an error that the imagenet_mean.binaryproto file is not found; the very same file that has worked for the valiudation process. What is wrong?
...
I0222 15:29:15.108032 15823 net.cpp:399] data -> label
I0222 15:29:15.108057 15823 data_transformer.cpp:25] Loading mean file from: /home/myuser/learning/caffe/data/ilsvrc12/imagenet_mean.binaryproto
F0222 15:29:15.108577 15830 db_lmdb.hpp:14] Check failed: mdb_status == 0 (2 vs. 0) No such file or directory
*** Check failure stack trace: ***
# 0x7fc82857edaa (unknown)
# 0x7fc82857ece4 (unknown)
# 0x7fc82857e6e6 (unknown)
# 0x7fc828581687 (unknown)
# 0x7fc828ba115e caffe::db::LMDB::Open()
# 0x7fc828b75644 caffe::DataReader::Body::InternalThreadEntry()
# 0x7fc828cc1470 caffe::InternalThread::entry()
# 0x7fc81f4a8a4a (unknown)
# 0x7fc826a98184 start_thread
# 0x7fc8271b437d (unknown)
# (nil) (unknown)
Aborted (core dumped)
Here is the prototxt I am using:
name: "CaffeNet"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: true
crop_size: 227
mean_file: "/home/myuser/learning/caffe/data/ilsvrc12/imagenet_mean.binaryproto"
#mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
}
# mean pixel / channel-wise mean instead of mean image
# transform_param {
# crop_size: 227
# mean_value: 104
# mean_value: 117
# mean_value: 123
# mirror: true
# }
data_param {
source: "examples/imagenet/ilsvrc12_train_lmdb"
batch_size: 256
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: false
crop_size: 227
mean_file: "/home/myuser/learning/caffe/data/ilsvrc12/imagenet_mean.binaryproto"
#mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
}
# mean pixel / channel-wise mean instead of mean image
# transform_param {
# crop_size: 227
# mean_value: 104
# mean_value: 117
# mean_value: 123
# mirror: false
# }
data_param {
source: "/sdc/repository/myuser/Imagenet2012/Imagenet2012trainLMDB"
#source: "examples/imagenet/ilsvrc12_val_lmdb"
batch_size: 50
backend: LMDB
}
}
layer {
name: "conv1"
…
I trained an FC network with HDF5 data layer, then used surgery for transplantation to a convolutional network, then changed the data layer to a probe-suitable data layer, i.e.:
from:
layer {
name: "layer_data_left"
type: "HDF5Data"
top: "data_left"
top: "labels_left"
include {
phase: TRAIN
}
hdf5_data_param {
source: "/home/me/Desktop/trainLeftPatches.txt"
batch_size: 128
}
}
to
layer {
name: "data_left"
type: "Input"
top: "data_right"
input_param { shape: { dim: 1 dim: 1 dim: 1241 dim: 367 } }
}
is there any reason this would go out of memory?:
>>> fc_net.forward()
F0729 20:02:02.205382 6821 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
Aborted (core dumped)
Or, is it more likely that I made a mistake somewhere in surgery & exchanging data layers?
Thank you.
I trained a GoogleNet model for a while, and now I'd like to restart from a checkpoint, adding a test phase. I have the test already in my train_val.prototxt file, and I added the proper parameters to my solver.prototxt ... but I get an error on the restart:
I0712 15:53:02.615947 47646 net.cpp:278] This network produces output loss2/loss1
I0712 15:53:02.615964 47646 net.cpp:278] This network produces output loss3/loss3
I0712 15:53:02.616109 47646 net.cpp:292] Network initialization done.
F0712 15:53:02.616665 47646 solver.cpp:128] Check failed: param_.test_iter_size() == num_test_nets (1 vs. 0) test_iter must be specified for each test network.
*** Check failure stack trace: ***
# 0x7f550cf70e6d (unknown)
# 0x7f550cf72ced (unknown)
# 0x7f550cf70a5c (unknown)
# 0x7f550cf7363e (unknown)
# 0x7f550d3b605b caffe::Solver<>::InitTestNets()
# 0x7f550d3b63ed caffe::Solver<>::Init()
# 0x7f550d3b6738 caffe::Solver<>::Solver()
# 0x7f550d4fa633 caffe::Creator_SGDSolver<>()
# 0x7f550da5bb76 caffe::SolverRegistry<>::CreateSolver()
# 0x7f550da548f4 train()
# 0x7f550da52316 main
# 0x7f5508f43b15 __libc_start_main
# 0x7f550da52d3d (unknown)
solver.prototxt
train_net: "<my_path>/train_val.prototxt"
test_iter: 1000
test_interval: 4000
test_initialization: false
display: 40
average_loss: 40
base_lr: 0.01
lr_policy: "step"
stepsize: 320000
gamma: 0.96
max_iter: 10000000
momentum: 0.9
weight_decay: 0.0002
snapshot: 40000
snapshot_prefix: "models/<my_path>"
solver_mode: CPU
train_val.prototxt train and test layers:
name: "GoogleNet"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: true
crop_size: 224
mean_value: 104
mean_value: 117
mean_value: 123
}
data_param {
source: "/<blah>/ilsvrc12_train_lmdb"
batch_size: 32
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: true
crop_size: 224
mean_value: 104
mean_value: 117
mean_value: 123
}
data_param {
source: "/<blah>/ilsvrc12_val_lmdb"
batch_size: 32
backend: LMDB
}
}
You should modify one place in your solver.prototxt from
train_net: "/train_val.prototxt"
to
net: "/train_val.prototxt"
Because the Solver does not use value of "train_net" to initialize a test net, so the test phase you added was not founded by the solver.
In fact, the parameters "train_net" and "test_net" are separately used to initialize a train net and a test net only, while "net" is used for both.
From time to time a site I'm maintaining get failed GET requests on some resources. The status is returned as (failed) and the type as Pending. The headers containg nothing more than the request itself, no response whatsoever.
The server is running Drupal and Varnish.
Any thoughts on what causes the failure or where to start debugging?
This might be related to What kind of network error is Chrome encountering when 'Status = (failed)' and 'Type = undefined' even though the type return a different message. Could someone maybe confirm this? If so, how could I debug this?
Thanks in advance
EDIT, I just installed wireshark to try to debug a bit more.
Here's the entire followup on the failed request
A dump of 94645, 94651, 94686 and 94688
No. Time Source Destination Protocol Length Info
94645 211.219995 192.168.0.101 85.134.37.196 HTTP 1192 [TCP Retransmission] GET /sites/default/files/css/css_4PXz_aZSHtm7FWHqYsMdm7sl9C4802BFn9tXlePfpJU.css HTTP/1.1
Frame 94645: 1192 bytes on wire (9536 bits), 1192 bytes captured (9536 bits)
Arrival Time: Nov 7, 2012 22:55:18.002267000 EET
Epoch Time: 1352321718.002267000 seconds
[Time delta from previous captured frame: 0.006199000 seconds]
[Time delta from previous displayed frame: 0.815800000 seconds]
[Time since reference or first frame: 211.219995000 seconds]
Frame Number: 94645
Frame Length: 1192 bytes (9536 bits)
Capture Length: 1192 bytes (9536 bits)
[Frame is marked: True]
[Frame is ignored: False]
[Protocols in frame: eth:ip:tcp:http]
[Coloring Rule Name: Bad TCP]
[Coloring Rule String: tcp.analysis.flags]
Ethernet II, Src: HonHaiPr_e7:39:2f (38:59:f9:e7:39:2f), Dst: D-LinkIn_2f:a4:a4 (14:d6:4d:2f:a4:a4)
Destination: D-LinkIn_2f:a4:a4 (14:d6:4d:2f:a4:a4)
Address: D-LinkIn_2f:a4:a4 (14:d6:4d:2f:a4:a4)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
Source: HonHaiPr_e7:39:2f (38:59:f9:e7:39:2f)
Address: HonHaiPr_e7:39:2f (38:59:f9:e7:39:2f)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
Type: IP (0x0800)
Internet Protocol Version 4, Src: 192.168.0.101 (192.168.0.101), Dst: 85.134.37.196 (85.134.37.196)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
0000 00.. = Differentiated Services Codepoint: Default (0x00)
.... ..00 = Explicit Congestion Notification: Not-ECT (Not ECN-Capable Transport) (0x00)
Total Length: 1178
Identification: 0x4e23 (20003)
Flags: 0x02 (Don't Fragment)
0... .... = Reserved bit: Not set
.1.. .... = Don't fragment: Set
..0. .... = More fragments: Not set
Fragment offset: 0
Time to live: 64
Protocol: TCP (6)
Header checksum: 0xabe3 [correct]
[Good: True]
[Bad: False]
Source: 192.168.0.101 (192.168.0.101)
Destination: 85.134.37.196 (85.134.37.196)
Transmission Control Protocol, Src Port: 51714 (51714), Dst Port: http (80), Seq: 1, Ack: 1, Len: 1126
Source port: 51714 (51714)
Destination port: http (80)
[Stream index: 5205]
Sequence number: 1 (relative sequence number)
[Next sequence number: 1127 (relative sequence number)]
Acknowledgement number: 1 (relative ack number)
Header length: 32 bytes
Flags: 0x018 (PSH, ACK)
000. .... .... = Reserved: Not set
...0 .... .... = Nonce: Not set
.... 0... .... = Congestion Window Reduced (CWR): Not set
.... .0.. .... = ECN-Echo: Not set
.... ..0. .... = Urgent: Not set
.... ...1 .... = Acknowledgement: Set
.... .... 1... = Push: Set
.... .... .0.. = Reset: Not set
.... .... ..0. = Syn: Not set
.... .... ...0 = Fin: Not set
Window size value: 229
[Calculated window size: 14656]
[Window size scaling factor: 64]
Checksum: 0xb15e [validation disabled]
[Good Checksum: False]
[Bad Checksum: False]
Options: (12 bytes)
No-Operation (NOP)
No-Operation (NOP)
Timestamps: TSval 30409363, TSecr 1723048723
Kind: Timestamp (8)
Length: 10
Timestamp value: 30409363
Timestamp echo reply: 1723048723
[SEQ/ACK analysis]
[Bytes in flight: 1126]
[TCP Analysis Flags]
[This frame is a (suspected) retransmission]
[Expert Info (Note/Sequence): Retransmission (suspected)]
[Message: Retransmission (suspected)]
[Severity level: Note]
[Group: Sequence]
[The RTO for this segment was: 1.480371000 seconds]
[RTO based on delta from frame: 92823]
Hypertext Transfer Protocol
GET /sites/default/files/css/css_4PXz_aZSHtm7FWHqYsMdm7sl9C4802BFn9tXlePfpJU.css HTTP/1.1\r\n
[Expert Info (Chat/Sequence): GET /sites/default/files/css/css_4PXz_aZSHtm7FWHqYsMdm7sl9C4802BFn9tXlePfpJU.css HTTP/1.1\r\n]
[Message: GET /sites/default/files/css/css_4PXz_aZSHtm7FWHqYsMdm7sl9C4802BFn9tXlePfpJU.css HTTP/1.1\r\n]
[Severity level: Chat]
[Group: Sequence]
Request Method: GET
Request URI: /sites/default/files/css/css_4PXz_aZSHtm7FWHqYsMdm7sl9C4802BFn9tXlePfpJU.css
Request Version: HTTP/1.1
Host: www.snellman.fi\r\n
Connection: keep-alive\r\n
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.94 Safari/537.4\r\n
Accept: text/css,*/*;q=0.1\r\n
Referer: http://www.snellman.fi/sv\r\n
Accept-Encoding: gzip,deflate,sdch\r\n
Accept-Language: en-US,en;q=0.8\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3\r\n
[truncated] Cookie: v_tid=82524dc4552b4cd4a22d1fa4e7e78d72; Drupal.toolbar.collapsed=0; ctools-collapsible-state=views-ui-advanced-column-uusimmat_reseptit%3A1%2Cviews-ui-advanced-column-Related%3A1; Drupal.tableDrag.showWeight=0; SESS4049
\r\n
[Full request URI: http://www.snellman.fi/sites/default/files/css/css_4PXz_aZSHtm7FWHqYsMdm7sl9C4802BFn9tXlePfpJU.css]
---------------------------------------------------------------------------------------------------------------------------------------------------------
No. Time Source Destination Protocol Length Info
94651 211.228592 85.134.37.196 192.168.0.101 TCP 66 [TCP Dup ACK 92860#3] http > 51714 [ACK] Seq=1 Ack=1 Win=14592 Len=0 TSval=1723049571 TSecr=1723044620
Frame 94651: 66 bytes on wire (528 bits), 66 bytes captured (528 bits)
Arrival Time: Nov 7, 2012 22:55:18.010864000 EET
Epoch Time: 1352321718.010864000 seconds
[Time delta from previous captured frame: 0.001166000 seconds]
[Time delta from previous displayed frame: 0.008597000 seconds]
[Time since reference or first frame: 211.228592000 seconds]
Frame Number: 94651
Frame Length: 66 bytes (528 bits)
Capture Length: 66 bytes (528 bits)
[Frame is marked: True]
[Frame is ignored: False]
[Protocols in frame: eth:ip:tcp]
[Coloring Rule Name: Bad TCP]
[Coloring Rule String: tcp.analysis.flags]
Ethernet II, Src: D-LinkIn_2f:a4:a4 (14:d6:4d:2f:a4:a4), Dst: HonHaiPr_e7:39:2f (38:59:f9:e7:39:2f)
Destination: HonHaiPr_e7:39:2f (38:59:f9:e7:39:2f)
Address: HonHaiPr_e7:39:2f (38:59:f9:e7:39:2f)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
Source: D-LinkIn_2f:a4:a4 (14:d6:4d:2f:a4:a4)
Address: D-LinkIn_2f:a4:a4 (14:d6:4d:2f:a4:a4)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
Type: IP (0x0800)
Internet Protocol Version 4, Src: 85.134.37.196 (85.134.37.196), Dst: 192.168.0.101 (192.168.0.101)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
0000 00.. = Differentiated Services Codepoint: Default (0x00)
.... ..00 = Explicit Congestion Notification: Not-ECT (Not ECN-Capable Transport) (0x00)
Total Length: 52
Identification: 0x8b62 (35682)
Flags: 0x02 (Don't Fragment)
0... .... = Reserved bit: Not set
.1.. .... = Don't fragment: Set
..0. .... = More fragments: Not set
Fragment offset: 0
Time to live: 58
Protocol: TCP (6)
Header checksum: 0x790a [correct]
[Good: True]
[Bad: False]
Source: 85.134.37.196 (85.134.37.196)
Destination: 192.168.0.101 (192.168.0.101)
Transmission Control Protocol, Src Port: http (80), Dst Port: 51714 (51714), Seq: 1, Ack: 1, Len: 0
Source port: http (80)
Destination port: 51714 (51714)
[Stream index: 5205]
Sequence number: 1 (relative sequence number)
Acknowledgement number: 1 (relative ack number)
Header length: 32 bytes
Flags: 0x010 (ACK)
000. .... .... = Reserved: Not set
...0 .... .... = Nonce: Not set
.... 0... .... = Congestion Window Reduced (CWR): Not set
.... .0.. .... = ECN-Echo: Not set
.... ..0. .... = Urgent: Not set
.... ...1 .... = Acknowledgement: Set
.... .... 0... = Push: Not set
.... .... .0.. = Reset: Not set
.... .... ..0. = Syn: Not set
.... .... ...0 = Fin: Not set
Window size value: 114
[Calculated window size: 14592]
[Window size scaling factor: 128]
Checksum: 0xd246 [validation disabled]
[Good Checksum: False]
[Bad Checksum: False]
Options: (12 bytes)
No-Operation (NOP)
No-Operation (NOP)
Timestamps: TSval 1723049571, TSecr 1723044620
Kind: Timestamp (8)
Length: 10
Timestamp value: 1723049571
Timestamp echo reply: 1723044620
[SEQ/ACK analysis]
[TCP Analysis Flags]
[This is a TCP duplicate ack]
[Duplicate ACK #: 3]
[Duplicate to the ACK in frame: 92860]
[Expert Info (Note/Sequence): Duplicate ACK (#3)]
[Message: Duplicate ACK (#3)]
[Severity level: Note]
[Group: Sequence]
---------------------------------------------------------------------------------------------------------------------------------------------------------
No. Time Source Destination Protocol Length Info
94686 211.305208 85.134.37.196 192.168.0.101 TCP 66 http > 51714 [FIN, ACK] Seq=1 Ack=1 Win=14592 Len=0 TSval=1723049647 TSecr=1723044620
Frame 94686: 66 bytes on wire (528 bits), 66 bytes captured (528 bits)
Arrival Time: Nov 7, 2012 22:55:18.087480000 EET
Epoch Time: 1352321718.087480000 seconds
[Time delta from previous captured frame: 0.000026000 seconds]
[Time delta from previous displayed frame: 0.076616000 seconds]
[Time since reference or first frame: 211.305208000 seconds]
Frame Number: 94686
Frame Length: 66 bytes (528 bits)
Capture Length: 66 bytes (528 bits)
[Frame is marked: True]
[Frame is ignored: False]
[Protocols in frame: eth:ip:tcp]
[Coloring Rule Name: HTTP]
[Coloring Rule String: http || tcp.port == 80]
Ethernet II, Src: D-LinkIn_2f:a4:a4 (14:d6:4d:2f:a4:a4), Dst: HonHaiPr_e7:39:2f (38:59:f9:e7:39:2f)
Destination: HonHaiPr_e7:39:2f (38:59:f9:e7:39:2f)
Address: HonHaiPr_e7:39:2f (38:59:f9:e7:39:2f)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
Source: D-LinkIn_2f:a4:a4 (14:d6:4d:2f:a4:a4)
Address: D-LinkIn_2f:a4:a4 (14:d6:4d:2f:a4:a4)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
Type: IP (0x0800)
Internet Protocol Version 4, Src: 85.134.37.196 (85.134.37.196), Dst: 192.168.0.101 (192.168.0.101)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
0000 00.. = Differentiated Services Codepoint: Default (0x00)
.... ..00 = Explicit Congestion Notification: Not-ECT (Not ECN-Capable Transport) (0x00)
Total Length: 52
Identification: 0x8b63 (35683)
Flags: 0x02 (Don't Fragment)
0... .... = Reserved bit: Not set
.1.. .... = Don't fragment: Set
..0. .... = More fragments: Not set
Fragment offset: 0
Time to live: 58
Protocol: TCP (6)
Header checksum: 0x7909 [correct]
[Good: True]
[Bad: False]
Source: 85.134.37.196 (85.134.37.196)
Destination: 192.168.0.101 (192.168.0.101)
Transmission Control Protocol, Src Port: http (80), Dst Port: 51714 (51714), Seq: 1, Ack: 1, Len: 0
Source port: http (80)
Destination port: 51714 (51714)
[Stream index: 5205]
Sequence number: 1 (relative sequence number)
Acknowledgement number: 1 (relative ack number)
Header length: 32 bytes
Flags: 0x011 (FIN, ACK)
000. .... .... = Reserved: Not set
...0 .... .... = Nonce: Not set
.... 0... .... = Congestion Window Reduced (CWR): Not set
.... .0.. .... = ECN-Echo: Not set
.... ..0. .... = Urgent: Not set
.... ...1 .... = Acknowledgement: Set
.... .... 0... = Push: Not set
.... .... .0.. = Reset: Not set
.... .... ..0. = Syn: Not set
.... .... ...1 = Fin: Set
[Expert Info (Chat/Sequence): Connection finish (FIN)]
[Message: Connection finish (FIN)]
[Severity level: Chat]
[Group: Sequence]
Window size value: 114
[Calculated window size: 14592]
[Window size scaling factor: 128]
Checksum: 0xd1f9 [validation disabled]
[Good Checksum: False]
[Bad Checksum: False]
Options: (12 bytes)
No-Operation (NOP)
No-Operation (NOP)
Timestamps: TSval 1723049647, TSecr 1723044620
Kind: Timestamp (8)
Length: 10
Timestamp value: 1723049647
Timestamp echo reply: 1723044620
---------------------------------------------------------------------------------------------------------------------------------------------------------
No. Time Source Destination Protocol Length Info
94688 211.305660 192.168.0.101 85.134.37.196 TCP 66 51714 > http [FIN, ACK] Seq=1127 Ack=2 Win=14656 Len=0 TSval=30409384 TSecr=1723049647
Frame 94688: 66 bytes on wire (528 bits), 66 bytes captured (528 bits)
Arrival Time: Nov 7, 2012 22:55:18.087932000 EET
Epoch Time: 1352321718.087932000 seconds
[Time delta from previous captured frame: 0.000264000 seconds]
[Time delta from previous displayed frame: 0.000452000 seconds]
[Time since reference or first frame: 211.305660000 seconds]
Frame Number: 94688
Frame Length: 66 bytes (528 bits)
Capture Length: 66 bytes (528 bits)
[Frame is marked: True]
[Frame is ignored: False]
[Protocols in frame: eth:ip:tcp]
[Coloring Rule Name: HTTP]
[Coloring Rule String: http || tcp.port == 80]
Ethernet II, Src: HonHaiPr_e7:39:2f (38:59:f9:e7:39:2f), Dst: D-LinkIn_2f:a4:a4 (14:d6:4d:2f:a4:a4)
Destination: D-LinkIn_2f:a4:a4 (14:d6:4d:2f:a4:a4)
Address: D-LinkIn_2f:a4:a4 (14:d6:4d:2f:a4:a4)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
Source: HonHaiPr_e7:39:2f (38:59:f9:e7:39:2f)
Address: HonHaiPr_e7:39:2f (38:59:f9:e7:39:2f)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
Type: IP (0x0800)
Internet Protocol Version 4, Src: 192.168.0.101 (192.168.0.101), Dst: 85.134.37.196 (85.134.37.196)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
0000 00.. = Differentiated Services Codepoint: Default (0x00)
.... ..00 = Explicit Congestion Notification: Not-ECT (Not ECN-Capable Transport) (0x00)
Total Length: 52
Identification: 0x4e24 (20004)
Flags: 0x02 (Don't Fragment)
0... .... = Reserved bit: Not set
.1.. .... = Don't fragment: Set
..0. .... = More fragments: Not set
Fragment offset: 0
Time to live: 64
Protocol: TCP (6)
Header checksum: 0xb048 [correct]
[Good: True]
[Bad: False]
Source: 192.168.0.101 (192.168.0.101)
Destination: 85.134.37.196 (85.134.37.196)
Transmission Control Protocol, Src Port: 51714 (51714), Dst Port: http (80), Seq: 1127, Ack: 2, Len: 0
Source port: 51714 (51714)
Destination port: http (80)
[Stream index: 5205]
Sequence number: 1127 (relative sequence number)
Acknowledgement number: 2 (relative ack number)
Header length: 32 bytes
Flags: 0x011 (FIN, ACK)
000. .... .... = Reserved: Not set
...0 .... .... = Nonce: Not set
.... 0... .... = Congestion Window Reduced (CWR): Not set
.... .0.. .... = ECN-Echo: Not set
.... ..0. .... = Urgent: Not set
.... ...1 .... = Acknowledgement: Set
.... .... 0... = Push: Not set
.... .... .0.. = Reset: Not set
.... .... ..0. = Syn: Not set
.... .... ...1 = Fin: Set
[Expert Info (Chat/Sequence): Connection finish (FIN)]
[Message: Connection finish (FIN)]
[Severity level: Chat]
[Group: Sequence]
Window size value: 229
[Calculated window size: 14656]
[Window size scaling factor: 64]
Checksum: 0x3c7e [validation disabled]
[Good Checksum: False]
[Bad Checksum: False]
Options: (12 bytes)
No-Operation (NOP)
No-Operation (NOP)
Timestamps: TSval 30409384, TSecr 1723049647
Kind: Timestamp (8)
Length: 10
Timestamp value: 30409384
Timestamp echo reply: 1723049647
[SEQ/ACK analysis]
[This is an ACK to the segment in frame: 94686]
[The RTT to ACK the segment was: 0.000452000 seconds]
In my case, Adblock was denying the request (because it was a Twitter request and I block most Twitter things).
I'm getting this exact problem in Chrome and for me it was the Offline Cache that was causing trouble.
I have an asp.net html5 application designed for mobile usage with offline support. So I hade the manifest attribute like this:
<html manifest="Manifest.ashx">
And I had an ashx handler that served the manifest file dynamically. For developing I had deleted the the manifest attribute on the main page but I had another page that was seldom used that had the manifest attribute. The latest files in the project (one image, and two js-files) had not been added to the manifest. When I visited the other page and went back to the main page the files missing in the manifest was starting to fail in the logs, no friendly error message, just type: pending and "failed". And this was just testing on Chrome on the desktop, no going offline.
Adding the files to the manifest list solved the problem for me.
My specific issue was solved by upgrading from Varnish 3.0.2 to 3.0.3.