pix2pixHD shows error after changing datasets - deep-learning

I was trying out the pix2pixHD code from the link below.
https://github.com/NVIDIA/pix2pixHD
The train.py worked with default images (in datasets/cityscapes). However, after changing images in the dataset, it shows the error below.
model [Pix2PixHDModel] was created
create web directory ./checkpoints/label2city/web...
Traceback (most recent call last):
File "/home/shimada/venv/py2.7/projects/Hiwi/pix2pixHD/train.py", line 58, in <module>
Variable(data['image']), Variable(data['feat']), infer=save_fake)
File "/home/shimada/venv/py2.7/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(*input, **kwargs)
File "/home/shimada/venv/py2.7/local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 66, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/shimada/venv/py2.7/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(*input, **kwargs)
File "/home/shimada/venv/py2.7/projects/Hiwi/pix2pixHD/models/pix2pixHD_model.py", line 141, in forward
fake_image = self.netG.forward(input_concat)
File "/home/shimada/venv/py2.7/projects/Hiwi/pix2pixHD/models/networks.py", line 213, in forward
return self.model(input)
File "/home/shimada/venv/py2.7/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(*input, **kwargs)
File "/home/shimada/venv/py2.7/local/lib/python2.7/site-packages/torch/nn/modules/container.py", line 67, in forward
input = module(input)
File "/home/shimada/venv/py2.7/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(*input, **kwargs)
File "/home/shimada/venv/py2.7/local/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 277, in forward
self.padding, self.dilation, self.groups)
File "/home/shimada/venv/py2.7/local/lib/python2.7/site-packages/torch/nn/functional.py", line 90, in conv2d
return f(input, weight, bias)
RuntimeError: Given groups=1, weight[64, 36, 7, 7], so expected input[1, 39, 518, 1030] to have 36 channels, but got 39 channels instead
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.c line=184 error=59 : device-side assert triggered
terminate called after throwing an instance of 'std::runtime_error'
what(): cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THC/generic/THCStorage.c:184
bash: line 1: 10965 Aborted (core dumped) env "PYCHARM_HOSTED"="1" "PYTHONUNBUFFERED"="1" "PYTHONIOENCODING"="UTF-8" "PYCHARM_MATPLOTLIB_PORT"="42188" "JETBRAINS_REMOTE_RUN"="1" "PYTHONPATH"="/home/shimada/.pycharm_helpers/pycharm_matplotlib_backend:/home/shimada/venv/py2.7/projects/Hiwi/pix2pixHD" /home/shimada/venv/py2.7/bin/python -u /home/shimada/venv/py2.7/projects/Hiwi/pix2pixHD/train.py
I changed the images with same size (width 2048, hight 1024), same extension (.png) and gave the same names. Why doesn't it work?

It looks like your original image/ground truth data is grayscale. In that case you have to define --input_nc 1 --output_nc 1 means grayscale. You also have to change in pix2pixHD code to load grayscale images.

Related

Pytorch torchvision.transforms execute randomly?

I am doing this transformation:
self.transform = transforms.Compose( {
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
} )
and then
image = Image.open(img_name)
if self.transform:
image = self.transform(image)
this works for the first epoch then how the hell it crashes for the second epoch?
why the f normalize getting PIL-image and not torch.tensor? is the execution of each transforms Compose items random?
Traceback (most recent call last): File
"/home/ubuntu/projects/ssl/src/train_supervised.py", line 63, in
main() File "/home/ubuntu/projects/ssl/src/train_supervised.py", line 60, in main
train() File "/home/ubuntu/projects/ssl/src/train_supervised.py", line 45, in train
for i, data in enumerate(tqdm_): File "/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/tqdm/std.py",
line 1195, in iter
for obj in iterable: File "/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torch/utils/data/dataloader.py",
line 530, in next
data = self._next_data() File "/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torch/utils/data/dataloader.py",
line 1224, in _next_data
return self._process_data(data) File "/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torch/utils/data/dataloader.py",
line 1250, in _process_data
data.reraise() File "/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torch/_utils.py",
line 457, in reraise
raise exception TypeError: Caught TypeError in DataLoader worker process 0. Original Traceback (most recent call last): File
"/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index) File "/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index] File
"/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ubuntu/projects/ssl/src/data_loader.py", line 44, in
getitem
image = self.transform(image) File "/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torchvision/transforms/transforms.py", line 95, in call
img = t(img) File "/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torch/nn/modules/module.py",
line 1110, in _call_impl
return forward_call(*input, **kwargs) File "/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torchvision/transforms/transforms.py", line 270, in forward
return F.normalize(tensor, self.mean, self.std, self.inplace) File
"/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torchvision/transforms/functional.py", line 341, in normalize
raise TypeError(f"Input tensor should be a torch tensor. Got {type(tensor)}.") TypeError: Input tensor should be a torch tensor.
Got <class 'PIL.Image.Image'>.
Python set iteration order is not deterministic. Kindly use list instead ([] rather than {}).

'ner_ontonotes_bert_mult' model Custom train

When I want to train this 'ner_ontonotes_bert_mult' model with my custom dataset it is showing the error below. (I have saved my datset in the ~\.deeppavlov\downloads\ontonotes folder that was mentioned in [deeppavlov documentation][1]. )
PS C:\Users\sghanta\Desktop\NER> & c:/Users/sghanta/Desktop/NER/env/Scripts/Activate.ps1
(env) PS C:\Users\sghanta\Desktop\NER> & c:/Users/sghanta/Desktop/NER/env/Scripts/python.exe c:/Users/sghanta/Desktop/NER/train_model.py
C:\Users\sghanta\Desktop\NER\env\lib\site-packages\numpy\_distributor_init.py:32: UserWarning: loaded more than 1 DLL from .libs:
C:\Users\sghanta\Desktop\NER\env\lib\site-packages\numpy\.libs\libopenblas.PYQHXLVVQ7VESDPUVUADXEVJOBGHJPAY.gfortran-win_amd64.dll
C:\Users\sghanta\Desktop\NER\env\lib\site-packages\numpy\.libs\libopenblas.WCDJNK7YVMPZQ2ME2ZZHJJRJ3JIKNDB7.gfortran-win_amd64.dll
stacklevel=1)
Traceback (most recent call last):
File "c:/Users/sghanta/Desktop/NER/train_model.py", line 12, in <module>
ner_model = train_model(configs.ner.ner_ontonotes_bert_mult)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\deeppavlov\__init__.py", line 29, in train_model
train_evaluate_model_from_config(config, download=download, recursive=recursive)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\deeppavlov\core\commands\train.py", line 92, in train_evaluate_model_from_config
data = read_data_by_config(config)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\deeppavlov\core\commands\train.py", line 58, in read_data_by_config
return reader.read(data_path, **reader_config)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\deeppavlov\dataset_readers\conll2003_reader.py", line 56, in read
dataset[name] = self.parse_ner_file(file_name)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\deeppavlov\dataset_readers\conll2003_reader.py", line 106, in parse_ner_file
raise Exception(f"Input is not valid {line}")
Exception: Input is not valid
O
(env) PS C:\Users\sghanta\Desktop\NER>
After cleaning the dataset the above error has gone but this is the new error.
New Error
2021-08-12 02:43:35.335 ERROR in 'deeppavlov.core.common.params'['params'] at line 112: Exception in <class 'deeppavlov.models.bert.bert_sequence_tagger.BertSequenceTagger'>
Traceback (most recent call last):
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
return fn(*args)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
target_list, run_metadata)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [13,13] rhs shape= [37,37]
[[{{node save/Assign_76}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\training\saver.py", line 1290, in restore
{self.saver_def.filename_tensor_name: save_path})
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
run_metadata_ptr)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
run_metadata)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [13,13] rhs shape= [37,37]
[[node save/Assign_76 (defined at C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
Original stack trace for 'save/Assign_76':
File "c:/Users/sghanta/Desktop/NER/train_model.py", line 12, in <module>
ner_model = train_model(configs.ner.ner_ontonotes_bert_mult)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\deeppavlov\__init__.py", line 29, in train_model
train_evaluate_model_from_config(config, download=download, recursive=recursive)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\deeppavlov\core\commands\train.py", line 121, in train_evaluate_model_from_config
trainer.train(iterator)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\deeppavlov\core\trainers\nn_trainer.py", line 334, in train
self.fit_chainer(iterator)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\deeppavlov\core\trainers\fit_trainer.py", line 104, in fit_chainer
component = from_params(component_config, mode='train')
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\deeppavlov\core\common\params.py", line 106, in from_params
component = obj(**dict(config_params, **kwargs))
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\deeppavlov\core\models\tf_backend.py", line 76, in __call__
obj.__init__(*args, **kwargs)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\deeppavlov\core\models\tf_backend.py", line 28, in _wrapped
return func(*args, **kwargs)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\deeppavlov\models\bert\bert_sequence_tagger.py", line 529, in __init__
**kwargs)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\deeppavlov\models\bert\bert_sequence_tagger.py", line 259, in __init__
self.load()
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\deeppavlov\core\models\tf_backend.py", line 28, in _wrapped
return func(*args, **kwargs)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\deeppavlov\models\bert\bert_sequence_tagger.py", line 457, in load
return super().load(exclude_scopes=exclude_scopes, **kwargs)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\deeppavlov\core\models\tf_model.py", line 251, in load
return super().load(exclude_scopes=exclude_scopes, **kwargs)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\deeppavlov\core\models\tf_model.py", line 54, in load
saver = tf.train.Saver(var_list)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\training\saver.py", line 828, in __init__
self.build()
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\training\saver.py", line 840, in build
self._build(self._filename, build_save=True, build_restore=True)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\training\saver.py", line 878, in _build
build_restore=build_restore)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\training\saver.py", line 508, in _build_internal
restore_sequentially, reshape)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\training\saver.py", line 350, in _AddRestoreOps
assign_ops.append(saveable.restore(saveable_tensors, shapes))
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\training\saving\saveable_object_util.py", line 73, in restore
self.op.get_shape().is_fully_defined())
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\ops\state_ops.py", line 227, in assign
validate_shape=validate_shape)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\ops\gen_state_ops.py", line 66, in assign
use_locking=use_locking, name=name)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "C:\Users\sghanta\Desktop\NER\env\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
Can anyone explain how to solve it.
[1]: http://docs.deeppavlov.ai/en/master/features/models/ner.html
conll2003_reader dataset reader failed to parse the following line:
O
conll2003_reader dataset reader expects that line is either empty or contains a token and a label. In your case only label is present.
So, I would suggest to clean your data from empty lines with labels.
Sample text from DeepPavlov docs:
EU B-ORG
rejects O
the O
call O
of O
Germany B-LOC
to O
boycott O
lamb O
from O
Great B-LOC
Britain I-LOC
. O
China B-LOC

How to set up rllib multi-agent PPO?

I have a very simple multi-agent environment set up for use with ray.rllib, and I'm trying to run a simple baseline test of a PPO vs. Random Policy training scenario as follows:
register_env("my_env", lambda _: MyEnv(num_agents=2))
mock = MyEnv()
obs_space = mock.observation_space
act_space = mock.action_space
tune.run(
"PPO",
stop={"training_iteration": args.num_iters},
config={
"env": "my_env",
"num_gpus":1,
"multiagent": {
"policies": {
"ppo_policy": (None, obs_space, act_space, {}),
"random": (RandomPolicy, obs_space, act_space, {}),
},
"policy_mapping_fn": (
lambda agent_id: {1:"appo_policy", 2:"random"}[agent_id]),
},
},
)
When testing this, I receive an error as follows:
Traceback (most recent call last):
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 467, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 381, in fetch_result
result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/worker.py", line 1513, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::PPO.__init__() (pid=18163, ip=192.168.1.25)
File "python/ray/_raylet.pyx", line 414, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 449, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 450, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 452, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 407, in ray._raylet.execute_task.function_executor
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 90, in __init__
Trainer.__init__(self, config, env, logger_creator)
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 455, in __init__
super().__init__(config, logger_creator)
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/tune/trainable.py", line 174, in __init__
self._setup(copy.deepcopy(self.config))
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 596, in _setup
self._init(self.config, self.env_creator)
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 129, in _init
self.optimizer = make_policy_optimizer(self.workers, config)
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/ppo/ppo.py", line 95, in choose_policy_optimizer
shuffle_sequences=config["shuffle_sequences"])
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/optimizers/multi_gpu_optimizer.py", line 99, in __init__
"Only TF graph policies are supported with multi-GPU. "
ValueError: Only TF graph policies are supported with multi-GPU. Try setting `simple_optimizer=True` instead.
I tried setting simple_optimizer:True in the config, but that gave me a NotImplementedError in the set_weights function of the rllib policy class...
I switched out the "PPO" in the config for "PG" and that ran fine, so it's unlikely anything to do with how I defined my environment. Any ideas on how to fix this?
Take a look at this issue. You are supposed to define:
def get_weights(self):
return None

Run python errors.use kereas to implement CNN

I am learning Deep Learning and want to use python-kereas to implement CNN, but when I run in command, it looks like some errors.
This is my source code. https://github.com/lijhong/CNN-kereas.git
And my fault is like this:
Traceback (most recent call last):
File "/home/ah0818lijhong/CNN-kereas/cnn-kereas.py", line 167, in <module>
model.fit(x_train, y_train,epochs=3)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/keras/models.py", line 845, in fit
initial_epoch=initial_epoch)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 1485, in fit
initial_epoch=initial_epoch)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 1140, in _fit_loop
outs = f(ins_batch)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2073, in __call__
feed_dict=feed_dict)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,868] = 115873 is not in [0, 20001)
[[Node: embedding_1/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](embedding_1/embeddi
ngs/read, _recv_embedding_1_input_0)]]
Caused by op u'embedding_1/Gather', defined at:
File "/home/ah0818lijhong/CNN-kereas/cnn-kereas.py", line 122, in <module>
model_left.add(embedding_layer)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/keras/models.py", line 422, in add
layer(x)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 554, in __call__
output = self.call(inputs, **kwargs)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/keras/layers/embeddings.py", line 119, in call
out = K.gather(self.embeddings, inputs)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 966, in gather
return tf.gather(reference, indices)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1207, in gather
validate_indices=validate_indices, name=name)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): indices[0,868] = 115873 is not in [0, 20001)
[[Node: embedding_1/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](embedding_1/embeddi
ngs/read, _recv_embedding_1_input_0)]]
I hope someone can help me fix it.

Google Datastore SSL errors in Python from Google Compute Instances

I have a Python Django application running on a Google Compute instance. It is using gcloudoem to interface from Django to Google Datastore. gcloudoem uses the same underlying code to communicate with Datastore as gcloud-python 0.5.x
At what seems to be completely random times, I will get SSL errors happening when trying to talk to Datastore. There is no pattern in where in my application code these happen. It's just during a random call to Datastore. Here are the two flavours of errors:
ERROR:django.request:Internal Server Error: /complete/google-oauth2/
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 111, in get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/local/lib/python2.7/dist-packages/django/views/decorators/cache.py", line 52, in _wrapped_view_func
response = view_func(request, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/django/views/decorators/csrf.py", line 57, in wrapped_view
return view_func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/apps/django_app/utils.py", line 51, in wrapper
return func(request, backend, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/apps/django_app/views.py", line 28, in complete
redirect_name=REDIRECT_FIELD_NAME, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/actions.py", line 43, in do_complete
user = backend.complete(user=user, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/backends/base.py", line 41, in complete
return self.auth_complete(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/utils.py", line 229, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/backends/oauth.py", line 387, in auth_complete
*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/utils.py", line 229, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/backends/oauth.py", line 396, in do_auth
return self.strategy.authenticate(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/strategies/django_strategy.py", line 96, in authenticate
return authenticate(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/django/contrib/auth/__init__.py", line 60, in authenticate
user = backend.authenticate(**credentials)
File "/usr/local/lib/python2.7/dist-packages/social/backends/base.py", line 82, in authenticate
return self.pipeline(pipeline, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/backends/base.py", line 85, in pipeline
out = self.run_pipeline(pipeline, pipeline_index, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/backends/base.py", line 112, in run_pipeline
result = func(*args, **out) or {}
File "/usr/local/lib/python2.7/dist-packages/social/pipeline/social_auth.py", line 20, in social_user
social = backend.strategy.storage.user.get_social_auth(provider, uid)
File "./social_gc/storage.py", line 105, in get_social_auth
return cls.objects.get(provider=provider, uid=uid)
File "/usr/local/lib/python2.7/dist-packages/gcloudoem/queryset/__init__.py", line 162, in get
num = len(clone)
File "/usr/local/lib/python2.7/dist-packages/gcloudoem/queryset/__init__.py", line 126, in __len__
self._fetch_all()
File "/usr/local/lib/python2.7/dist-packages/gcloudoem/queryset/__init__.py", line 370, in _fetch_all
self._result_cache = list(self.iterator())
File "/usr/local/lib/python2.7/dist-packages/gcloudoem/datastore/query.py", line 480, in __iter__
self.next_page()
File "/usr/local/lib/python2.7/dist-packages/gcloudoem/datastore/query.py", line 452, in next_page
transaction_id=transaction and transaction.id,
File "/usr/local/lib/python2.7/dist-packages/gcloudoem/datastore/connection.py", line 249, in run_query
response = self._rpc('runQuery', request, datastore_pb.RunQueryResponse)
File "/usr/local/lib/python2.7/dist-packages/gcloudoem/datastore/connection.py", line 159, in _rpc
data=request_pb.SerializeToString()
File "/usr/local/lib/python2.7/dist-packages/gcloudoem/datastore/connection.py", line 134, in _request
body=data
File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line 589, in new_request
redirections, connection_type)
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1609, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1351, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1307, in _conn_request
response = conn.getresponse()
File "/usr/lib/python2.7/httplib.py", line 1127, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 453, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 409, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "/usr/lib/python2.7/socket.py", line 480, in readline
data = self._sock.recv(self._rbufsize)
File "/usr/lib/python2.7/ssl.py", line 734, in recv
return self.read(buflen)
File "/usr/lib/python2.7/ssl.py", line 621, in read
v = self._sslobj.read(len or 1024)
SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1752)
Unfortunately, for the second, I don't have a full stacktrace handy:
[SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:1752)
These errors don't happen when I am using the GCD tool. Does anyone have any idea what is happening here? Is this some sort of networking problem?
I have also been receiving the [SSL: WRONG_VERSION_NUMBER] error when trying to use Datastore, however, I can repeat the error on demand. As James suggested, I get this error as soon as I introduce another thread querying Datastore. They are using completely separate application-level objects but I would imagine that as they are getting lower down in the gcloud library or lower down still there is some sort of object-sharing happening that is causing this problem.
UPDATE: I have since found the following very helpful thread (https://github.com/GoogleCloudPlatform/gcloud-python/issues/1214) which identifies an issue across the gcloud python apis due to a common dependency on the httplib2 library which turns out to not be thread-safe.
Somebody has written a wrapper for the gcloud suite that will use the requests library instead of httplib2 (gcloud requests) but it is built for Python 2.7. I didnt try to convert it for my Python3 project and instead used the very simple httplib2shim library to monkey-patch httplib2 with urllib3.
It was as simple as adding this :
import httplib2shim
httplib2shim.patch()
I'm now making calls from multiple threads without an issue.
: )
Two things come to mind which may be leading to this. Sorry this is not super specific; trying to help!
Threads - there are objects being shared across threads somehow which is causing the problem
Connections - There are too many connections being made, causing in failure (especially for the second error)