How to set up rllib multi-agent PPO? - reinforcement-learning

I have a very simple multi-agent environment set up for use with ray.rllib, and I'm trying to run a simple baseline test of a PPO vs. Random Policy training scenario as follows:
register_env("my_env", lambda _: MyEnv(num_agents=2))
mock = MyEnv()
obs_space = mock.observation_space
act_space = mock.action_space
tune.run(
"PPO",
stop={"training_iteration": args.num_iters},
config={
"env": "my_env",
"num_gpus":1,
"multiagent": {
"policies": {
"ppo_policy": (None, obs_space, act_space, {}),
"random": (RandomPolicy, obs_space, act_space, {}),
},
"policy_mapping_fn": (
lambda agent_id: {1:"appo_policy", 2:"random"}[agent_id]),
},
},
)
When testing this, I receive an error as follows:
Traceback (most recent call last):
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 467, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 381, in fetch_result
result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/worker.py", line 1513, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::PPO.__init__() (pid=18163, ip=192.168.1.25)
File "python/ray/_raylet.pyx", line 414, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 449, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 450, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 452, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 407, in ray._raylet.execute_task.function_executor
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 90, in __init__
Trainer.__init__(self, config, env, logger_creator)
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 455, in __init__
super().__init__(config, logger_creator)
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/tune/trainable.py", line 174, in __init__
self._setup(copy.deepcopy(self.config))
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 596, in _setup
self._init(self.config, self.env_creator)
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 129, in _init
self.optimizer = make_policy_optimizer(self.workers, config)
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/ppo/ppo.py", line 95, in choose_policy_optimizer
shuffle_sequences=config["shuffle_sequences"])
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/optimizers/multi_gpu_optimizer.py", line 99, in __init__
"Only TF graph policies are supported with multi-GPU. "
ValueError: Only TF graph policies are supported with multi-GPU. Try setting `simple_optimizer=True` instead.
I tried setting simple_optimizer:True in the config, but that gave me a NotImplementedError in the set_weights function of the rllib policy class...
I switched out the "PPO" in the config for "PG" and that ran fine, so it's unlikely anything to do with how I defined my environment. Any ideas on how to fix this?

Take a look at this issue. You are supposed to define:
def get_weights(self):
return None

Related

Having "make_aware expects a naive datetime" while migrate

I have developed an application with Django.
This is working fine in my PC with sqlite backend.
But when I am trying to go live with linux server and mysql backend then I am getting bellow error while first time migration.
(env-bulkmailer) [root#localhost bulkmailer]# python3 manage.py migrate
Traceback (most recent call last):
File "/var/www/bulkmailer-folder/bulkmailer/manage.py", line 22, in <module>
main()
File "/var/www/bulkmailer-folder/bulkmailer/manage.py", line 18, in main
execute_from_command_line(sys.argv)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/core/management/__init__.py", line 446, in execute_from_command_line
utility.execute()
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/core/management/__init__.py", line 440, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/core/management/base.py", line 402, in run_from_argv
self.execute(*args, **cmd_options)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/core/management/base.py", line 448, in execute
output = self.handle(*args, **options)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/core/management/base.py", line 96, in wrapped
res = handle_func(*args, **kwargs)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/core/management/commands/migrate.py", line 114, in handle
executor = MigrationExecutor(connection, self.migration_progress_callback)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/migrations/executor.py", line 18, in __init__
self.loader = MigrationLoader(self.connection)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/migrations/loader.py", line 58, in __init__
self.build_graph()
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/migrations/loader.py", line 235, in build_graph
self.applied_migrations = recorder.applied_migrations()
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/migrations/recorder.py", line 82, in applied_migrations
return {
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/models/query.py", line 394, in __iter__
self._fetch_all()
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/models/query.py", line 1866, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/models/query.py", line 117, in __iter__
for row in compiler.results_iter(results):
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/models/sql/compiler.py", line 1336, in apply_converters
value = converter(value, expression, connection)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/backends/mysql/operations.py", line 331, in convert_datetimefield_value
value = timezone.make_aware(value, self.connection.timezone)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/utils/timezone.py", line 291, in make_aware
raise ValueError("make_aware expects a naive datetime, got %s" % value)
ValueError: make_aware expects a naive datetime, got 2022-11-20 12:39:18.866299+00:00
In settings-
USE_TZ = True
I have run mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql also as django doc.
I am using django 4.1.3 and mysql community 8.0.30
Thanks in advance.
Ran into the same issue. At some point, django assumes that the the data is timezone-naive without checking. Here's the work-around.
Update the make_aware function that is listed in your stack trace here:
/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/utils/timezone.py", line 291, in make_aware
Instead of raising an error if the value is already aware, just return the aware value. See the last else statement below.
def make_aware(value, timezone=None, is_dst=NOT_PASSED):
"""Make a naive datetime.datetime in a given time zone aware."""
if is_dst is NOT_PASSED:
is_dst = None
else:
warnings.warn(
"The is_dst argument to make_aware(), used by the Trunc() "
"database functions and QuerySet.datetimes(), is deprecated as it "
"has no effect with zoneinfo time zones.",
RemovedInDjango50Warning,
)
if timezone is None:
timezone = get_current_timezone()
if _is_pytz_zone(timezone):
# This method is available for pytz time zones.
return timezone.localize(value, is_dst=is_dst)
else:
# Check that we won't overwrite the timezone of an aware datetime.
if is_aware(value):
# ADD THIS
return value
# REMOVE THE FOLLOWING LINE
# raise ValueError("make_aware expects a naive datetime, got %s" % value)
# This may be wrong around DST changes!
return value.replace(tzinfo=timezone)

Pytorch torchvision.transforms execute randomly?

I am doing this transformation:
self.transform = transforms.Compose( {
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
} )
and then
image = Image.open(img_name)
if self.transform:
image = self.transform(image)
this works for the first epoch then how the hell it crashes for the second epoch?
why the f normalize getting PIL-image and not torch.tensor? is the execution of each transforms Compose items random?
Traceback (most recent call last): File
"/home/ubuntu/projects/ssl/src/train_supervised.py", line 63, in
main() File "/home/ubuntu/projects/ssl/src/train_supervised.py", line 60, in main
train() File "/home/ubuntu/projects/ssl/src/train_supervised.py", line 45, in train
for i, data in enumerate(tqdm_): File "/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/tqdm/std.py",
line 1195, in iter
for obj in iterable: File "/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torch/utils/data/dataloader.py",
line 530, in next
data = self._next_data() File "/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torch/utils/data/dataloader.py",
line 1224, in _next_data
return self._process_data(data) File "/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torch/utils/data/dataloader.py",
line 1250, in _process_data
data.reraise() File "/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torch/_utils.py",
line 457, in reraise
raise exception TypeError: Caught TypeError in DataLoader worker process 0. Original Traceback (most recent call last): File
"/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index) File "/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index] File
"/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ubuntu/projects/ssl/src/data_loader.py", line 44, in
getitem
image = self.transform(image) File "/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torchvision/transforms/transforms.py", line 95, in call
img = t(img) File "/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torch/nn/modules/module.py",
line 1110, in _call_impl
return forward_call(*input, **kwargs) File "/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torchvision/transforms/transforms.py", line 270, in forward
return F.normalize(tensor, self.mean, self.std, self.inplace) File
"/home/ubuntu/anaconda3/envs/pytorch-1.11.0/lib/python3.9/site-packages/torchvision/transforms/functional.py", line 341, in normalize
raise TypeError(f"Input tensor should be a torch tensor. Got {type(tensor)}.") TypeError: Input tensor should be a torch tensor.
Got <class 'PIL.Image.Image'>.
Python set iteration order is not deterministic. Kindly use list instead ([] rather than {}).

pix2pixHD shows error after changing datasets

I was trying out the pix2pixHD code from the link below.
https://github.com/NVIDIA/pix2pixHD
The train.py worked with default images (in datasets/cityscapes). However, after changing images in the dataset, it shows the error below.
model [Pix2PixHDModel] was created
create web directory ./checkpoints/label2city/web...
Traceback (most recent call last):
File "/home/shimada/venv/py2.7/projects/Hiwi/pix2pixHD/train.py", line 58, in <module>
Variable(data['image']), Variable(data['feat']), infer=save_fake)
File "/home/shimada/venv/py2.7/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(*input, **kwargs)
File "/home/shimada/venv/py2.7/local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 66, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/shimada/venv/py2.7/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(*input, **kwargs)
File "/home/shimada/venv/py2.7/projects/Hiwi/pix2pixHD/models/pix2pixHD_model.py", line 141, in forward
fake_image = self.netG.forward(input_concat)
File "/home/shimada/venv/py2.7/projects/Hiwi/pix2pixHD/models/networks.py", line 213, in forward
return self.model(input)
File "/home/shimada/venv/py2.7/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(*input, **kwargs)
File "/home/shimada/venv/py2.7/local/lib/python2.7/site-packages/torch/nn/modules/container.py", line 67, in forward
input = module(input)
File "/home/shimada/venv/py2.7/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(*input, **kwargs)
File "/home/shimada/venv/py2.7/local/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 277, in forward
self.padding, self.dilation, self.groups)
File "/home/shimada/venv/py2.7/local/lib/python2.7/site-packages/torch/nn/functional.py", line 90, in conv2d
return f(input, weight, bias)
RuntimeError: Given groups=1, weight[64, 36, 7, 7], so expected input[1, 39, 518, 1030] to have 36 channels, but got 39 channels instead
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.c line=184 error=59 : device-side assert triggered
terminate called after throwing an instance of 'std::runtime_error'
what(): cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THC/generic/THCStorage.c:184
bash: line 1: 10965 Aborted (core dumped) env "PYCHARM_HOSTED"="1" "PYTHONUNBUFFERED"="1" "PYTHONIOENCODING"="UTF-8" "PYCHARM_MATPLOTLIB_PORT"="42188" "JETBRAINS_REMOTE_RUN"="1" "PYTHONPATH"="/home/shimada/.pycharm_helpers/pycharm_matplotlib_backend:/home/shimada/venv/py2.7/projects/Hiwi/pix2pixHD" /home/shimada/venv/py2.7/bin/python -u /home/shimada/venv/py2.7/projects/Hiwi/pix2pixHD/train.py
I changed the images with same size (width 2048, hight 1024), same extension (.png) and gave the same names. Why doesn't it work?
It looks like your original image/ground truth data is grayscale. In that case you have to define --input_nc 1 --output_nc 1 means grayscale. You also have to change in pix2pixHD code to load grayscale images.

Run python errors.use kereas to implement CNN

I am learning Deep Learning and want to use python-kereas to implement CNN, but when I run in command, it looks like some errors.
This is my source code. https://github.com/lijhong/CNN-kereas.git
And my fault is like this:
Traceback (most recent call last):
File "/home/ah0818lijhong/CNN-kereas/cnn-kereas.py", line 167, in <module>
model.fit(x_train, y_train,epochs=3)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/keras/models.py", line 845, in fit
initial_epoch=initial_epoch)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 1485, in fit
initial_epoch=initial_epoch)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 1140, in _fit_loop
outs = f(ins_batch)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2073, in __call__
feed_dict=feed_dict)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,868] = 115873 is not in [0, 20001)
[[Node: embedding_1/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](embedding_1/embeddi
ngs/read, _recv_embedding_1_input_0)]]
Caused by op u'embedding_1/Gather', defined at:
File "/home/ah0818lijhong/CNN-kereas/cnn-kereas.py", line 122, in <module>
model_left.add(embedding_layer)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/keras/models.py", line 422, in add
layer(x)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 554, in __call__
output = self.call(inputs, **kwargs)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/keras/layers/embeddings.py", line 119, in call
out = K.gather(self.embeddings, inputs)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 966, in gather
return tf.gather(reference, indices)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1207, in gather
validate_indices=validate_indices, name=name)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/ah0818lijhong/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): indices[0,868] = 115873 is not in [0, 20001)
[[Node: embedding_1/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](embedding_1/embeddi
ngs/read, _recv_embedding_1_input_0)]]
I hope someone can help me fix it.

Google Datastore SSL errors in Python from Google Compute Instances

I have a Python Django application running on a Google Compute instance. It is using gcloudoem to interface from Django to Google Datastore. gcloudoem uses the same underlying code to communicate with Datastore as gcloud-python 0.5.x
At what seems to be completely random times, I will get SSL errors happening when trying to talk to Datastore. There is no pattern in where in my application code these happen. It's just during a random call to Datastore. Here are the two flavours of errors:
ERROR:django.request:Internal Server Error: /complete/google-oauth2/
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 111, in get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/local/lib/python2.7/dist-packages/django/views/decorators/cache.py", line 52, in _wrapped_view_func
response = view_func(request, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/django/views/decorators/csrf.py", line 57, in wrapped_view
return view_func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/apps/django_app/utils.py", line 51, in wrapper
return func(request, backend, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/apps/django_app/views.py", line 28, in complete
redirect_name=REDIRECT_FIELD_NAME, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/actions.py", line 43, in do_complete
user = backend.complete(user=user, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/backends/base.py", line 41, in complete
return self.auth_complete(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/utils.py", line 229, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/backends/oauth.py", line 387, in auth_complete
*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/utils.py", line 229, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/backends/oauth.py", line 396, in do_auth
return self.strategy.authenticate(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/strategies/django_strategy.py", line 96, in authenticate
return authenticate(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/django/contrib/auth/__init__.py", line 60, in authenticate
user = backend.authenticate(**credentials)
File "/usr/local/lib/python2.7/dist-packages/social/backends/base.py", line 82, in authenticate
return self.pipeline(pipeline, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/backends/base.py", line 85, in pipeline
out = self.run_pipeline(pipeline, pipeline_index, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/social/backends/base.py", line 112, in run_pipeline
result = func(*args, **out) or {}
File "/usr/local/lib/python2.7/dist-packages/social/pipeline/social_auth.py", line 20, in social_user
social = backend.strategy.storage.user.get_social_auth(provider, uid)
File "./social_gc/storage.py", line 105, in get_social_auth
return cls.objects.get(provider=provider, uid=uid)
File "/usr/local/lib/python2.7/dist-packages/gcloudoem/queryset/__init__.py", line 162, in get
num = len(clone)
File "/usr/local/lib/python2.7/dist-packages/gcloudoem/queryset/__init__.py", line 126, in __len__
self._fetch_all()
File "/usr/local/lib/python2.7/dist-packages/gcloudoem/queryset/__init__.py", line 370, in _fetch_all
self._result_cache = list(self.iterator())
File "/usr/local/lib/python2.7/dist-packages/gcloudoem/datastore/query.py", line 480, in __iter__
self.next_page()
File "/usr/local/lib/python2.7/dist-packages/gcloudoem/datastore/query.py", line 452, in next_page
transaction_id=transaction and transaction.id,
File "/usr/local/lib/python2.7/dist-packages/gcloudoem/datastore/connection.py", line 249, in run_query
response = self._rpc('runQuery', request, datastore_pb.RunQueryResponse)
File "/usr/local/lib/python2.7/dist-packages/gcloudoem/datastore/connection.py", line 159, in _rpc
data=request_pb.SerializeToString()
File "/usr/local/lib/python2.7/dist-packages/gcloudoem/datastore/connection.py", line 134, in _request
body=data
File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line 589, in new_request
redirections, connection_type)
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1609, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1351, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1307, in _conn_request
response = conn.getresponse()
File "/usr/lib/python2.7/httplib.py", line 1127, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 453, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 409, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "/usr/lib/python2.7/socket.py", line 480, in readline
data = self._sock.recv(self._rbufsize)
File "/usr/lib/python2.7/ssl.py", line 734, in recv
return self.read(buflen)
File "/usr/lib/python2.7/ssl.py", line 621, in read
v = self._sslobj.read(len or 1024)
SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1752)
Unfortunately, for the second, I don't have a full stacktrace handy:
[SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:1752)
These errors don't happen when I am using the GCD tool. Does anyone have any idea what is happening here? Is this some sort of networking problem?
I have also been receiving the [SSL: WRONG_VERSION_NUMBER] error when trying to use Datastore, however, I can repeat the error on demand. As James suggested, I get this error as soon as I introduce another thread querying Datastore. They are using completely separate application-level objects but I would imagine that as they are getting lower down in the gcloud library or lower down still there is some sort of object-sharing happening that is causing this problem.
UPDATE: I have since found the following very helpful thread (https://github.com/GoogleCloudPlatform/gcloud-python/issues/1214) which identifies an issue across the gcloud python apis due to a common dependency on the httplib2 library which turns out to not be thread-safe.
Somebody has written a wrapper for the gcloud suite that will use the requests library instead of httplib2 (gcloud requests) but it is built for Python 2.7. I didnt try to convert it for my Python3 project and instead used the very simple httplib2shim library to monkey-patch httplib2 with urllib3.
It was as simple as adding this :
import httplib2shim
httplib2shim.patch()
I'm now making calls from multiple threads without an issue.
: )
Two things come to mind which may be leading to this. Sorry this is not super specific; trying to help!
Threads - there are objects being shared across threads somehow which is causing the problem
Connections - There are too many connections being made, causing in failure (especially for the second error)