How to resolve coreferences without Internet using AllenNLP and coref-spanbert-large? - allennlp

A want to resolve coreferences without Internet using AllenNLP and coref-spanbert-large model.
I try to do it in the way that is describing here https://demo.allennlp.org/coreference-resolution
My code:
from allennlp.predictors.predictor import Predictor
import allennlp_models.tagging
predictor = Predictor.from_path(r"C:\Users\aap\Desktop\coref-spanbert-large-2021.03.10.tar.gz")
example = 'Paul Allen was born on January 21, 1953, in Seattle, Washington, to Kenneth Sam Allen and Edna Faye Allen.Allen attended Lakeside School, a private school in Seattle, where he befriended Bill Gates, two years younger, with whom he shared an enthusiasm for computers.'
pred = predictor.predict(document=example)
coref_res = predictor.coref_resolved(example)
print(pred)
print(coref_res)
When I have an access to internet the code works correctly.
But when I don't have an access to internet I get the following errors:
Traceback (most recent call last):
File "C:/Users/aap/Desktop/CoreNLP/Coref_AllenNLP.py", line 14, in <module>
predictor = Predictor.from_path(r"C:\Users\aap\Desktop\coref-spanbert-large-2021.03.10.tar.gz")
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\predictors\predictor.py", line 361, in from_path
load_archive(archive_path, cuda_device=cuda_device, overrides=overrides),
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\models\archival.py", line 206, in load_archive
config.duplicate(), serialization_dir
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\models\archival.py", line 232, in _load_dataset_readers
dataset_reader_params, serialization_dir=serialization_dir
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 604, in from_params
**extras,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 632, in from_params
kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 200, in create_kwargs
cls.__name__, param_name, annotation, param.default, params, **extras
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 307, in pop_and_construct_arg
return construct_arg(class_name, name, popped_params, annotation, default, **extras)
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 391, in construct_arg
**extras,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 341, in construct_arg
return annotation.from_params(params=popped_params, **subextras)
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 604, in from_params
**extras,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 634, in from_params
return constructor_to_call(**kwargs) # type: ignore
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\data\token_indexers\pretrained_transformer_mismatched_indexer.py", line 63, in __init__
**kwargs,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\data\token_indexers\pretrained_transformer_indexer.py", line 58, in __init__
model_name, tokenizer_kwargs=tokenizer_kwargs
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\data\tokenizers\pretrained_transformer_tokenizer.py", line 71, in __init__
model_name, add_special_tokens=False, **tokenizer_kwargs
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\cached_transformers.py", line 110, in get_tokenizer
**kwargs,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 362, in from_pretrained
config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\models\auto\configuration_auto.py", line 368, in from_pretrained
config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\configuration_utils.py", line 424, in get_config_dict
use_auth_token=use_auth_token,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\file_utils.py", line 1087, in cached_path
local_files_only=local_files_only,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\file_utils.py", line 1268, in get_from_cache
"Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.
Process finished with exit code 1
Please, say me, what do I need to do my code works without Internet?

You will need a local copy of transformer model's configuration file and vocabulary so that the tokenizer and token indexer don't need to download those:
from transformers import AutoConfig, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(transformer_model_name)
config = AutoConfig.from_pretrained(transformer_model_name)
tokenizer.save_pretrained(local_config_path)
config.to_json_file(local_config_path + "/config.json")
You will then need to override the transformer model name in the configuration file to the local directory (local_config_path) where you saved these things:
predictor = Predictor.from_path(
r"C:\Users\aap\Desktop\coref-spanbert-large-2021.03.10.tar.gz",
overrides={
"dataset_reader.token_indexers.tokens.model_name": local_config_path,
"validation_dataset_reader.token_indexers.tokens.model_name": local_config_path,
"model.text_field_embedder.tokens.model_name": local_config_path,
},
)

I have run into similar problem when using structured-prediction-srl-bert without internet, and I saw in the logs 4 item for downloads:
dataset_reader.bert_model_name = bert-base-uncased, Downloading 4 files
model INFO vocabulary.py - Loading token dictionary from data/structured-prediction-srl-bert.2020.12.15/vocabulary. Downloading... 4x smaller files
Spacy models 'en_core_web_sm' not found
later on, [nltk_data] Error loading punkt: <urlopen error [Errno -3] Temporary failure in name resolution> [nltk_data] Error loading wordnet: <urlopen error [Errno -3] Temporary failure in name resolution>
I have solved it with these steps:
structured-prediction-srl-bert:
I have downloaded the structured-prediction-srl-bert.2020.12.15.tar.gz from the https://demo.allennlp.org/semantic-role-labeling (Model Card tab) -
https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz
I have unzipped it into ./data/structured-prediction-srl-bert.2020.12.15
The code:
pip install allennlp==2.10.0 allennlp-models==2.10.0
from allennlp.predictors.predictor import Predictor
predictor = Predictor.from_path("./data/structured-prediction-srl-bert.2020.12.15/")
bert-base-uncased
I have created a folder ./data/bert-base-uncased and there I have downloaded these files from https://huggingface.co/bert-base-uncased/tree/main
config.json
tokenizer.json
tokenizer_config.json
vocab.txt
pytorch_model.bin
Aditionally, I had to change the "bert_model_name" from "bert-base-uncased" into a path "./data/bert-base-uncased", the earlier causes the download. This has to be done in the ./data/structured-prediction-srl-bert.2020.12.15/config.json , and there are two occurences.
python -m spacy download en_core_web_sm
python -c 'import nltk; nltk.download("punkt"); nltk.download("wordnet")'
After these steps the allennlp did not need internet anymore.

Related

Having "make_aware expects a naive datetime" while migrate

I have developed an application with Django.
This is working fine in my PC with sqlite backend.
But when I am trying to go live with linux server and mysql backend then I am getting bellow error while first time migration.
(env-bulkmailer) [root#localhost bulkmailer]# python3 manage.py migrate
Traceback (most recent call last):
File "/var/www/bulkmailer-folder/bulkmailer/manage.py", line 22, in <module>
main()
File "/var/www/bulkmailer-folder/bulkmailer/manage.py", line 18, in main
execute_from_command_line(sys.argv)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/core/management/__init__.py", line 446, in execute_from_command_line
utility.execute()
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/core/management/__init__.py", line 440, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/core/management/base.py", line 402, in run_from_argv
self.execute(*args, **cmd_options)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/core/management/base.py", line 448, in execute
output = self.handle(*args, **options)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/core/management/base.py", line 96, in wrapped
res = handle_func(*args, **kwargs)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/core/management/commands/migrate.py", line 114, in handle
executor = MigrationExecutor(connection, self.migration_progress_callback)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/migrations/executor.py", line 18, in __init__
self.loader = MigrationLoader(self.connection)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/migrations/loader.py", line 58, in __init__
self.build_graph()
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/migrations/loader.py", line 235, in build_graph
self.applied_migrations = recorder.applied_migrations()
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/migrations/recorder.py", line 82, in applied_migrations
return {
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/models/query.py", line 394, in __iter__
self._fetch_all()
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/models/query.py", line 1866, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/models/query.py", line 117, in __iter__
for row in compiler.results_iter(results):
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/models/sql/compiler.py", line 1336, in apply_converters
value = converter(value, expression, connection)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/backends/mysql/operations.py", line 331, in convert_datetimefield_value
value = timezone.make_aware(value, self.connection.timezone)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/utils/timezone.py", line 291, in make_aware
raise ValueError("make_aware expects a naive datetime, got %s" % value)
ValueError: make_aware expects a naive datetime, got 2022-11-20 12:39:18.866299+00:00
In settings-
USE_TZ = True
I have run mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql also as django doc.
I am using django 4.1.3 and mysql community 8.0.30
Thanks in advance.
Ran into the same issue. At some point, django assumes that the the data is timezone-naive without checking. Here's the work-around.
Update the make_aware function that is listed in your stack trace here:
/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/utils/timezone.py", line 291, in make_aware
Instead of raising an error if the value is already aware, just return the aware value. See the last else statement below.
def make_aware(value, timezone=None, is_dst=NOT_PASSED):
"""Make a naive datetime.datetime in a given time zone aware."""
if is_dst is NOT_PASSED:
is_dst = None
else:
warnings.warn(
"The is_dst argument to make_aware(), used by the Trunc() "
"database functions and QuerySet.datetimes(), is deprecated as it "
"has no effect with zoneinfo time zones.",
RemovedInDjango50Warning,
)
if timezone is None:
timezone = get_current_timezone()
if _is_pytz_zone(timezone):
# This method is available for pytz time zones.
return timezone.localize(value, is_dst=is_dst)
else:
# Check that we won't overwrite the timezone of an aware datetime.
if is_aware(value):
# ADD THIS
return value
# REMOVE THE FOLLOWING LINE
# raise ValueError("make_aware expects a naive datetime, got %s" % value)
# This may be wrong around DST changes!
return value.replace(tzinfo=timezone)

Upload Pandas dataframe as a JSON object in Cloud Storage

I have been trying to upload a Pandas dataframe to a JSON object in Cloud Storage using Cloud Function. Follwing is my code -
def upload_blob(bucket_name, source_file_name, destination_blob_name):
"""Uploads a file to the bucket."""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_file(source_file_name)
print('File {} uploaded to {}.'.format(
source_file_name,
destination_blob_name))
final_file = pd.concat([df, df_second], axis=0)
final_file.to_json('/tmp/abc.json')
with open('/tmp/abc.json', 'r') as file_obj:
upload_blob('test-bucket',file_obj,'abc.json')
I am getting the following error in line - blob.upload_from_file(source_file_name)
Deployment failure:
Function failed on loading user code. Error message: Code in file main.py
can't be loaded.
Detailed stack trace: Traceback (most recent call last):
File "/env/local/lib/python3.7/site-
packages/google/cloud/functions/worker.py", line 305, in
check_or_load_user_function
_function_handler.load_user_function()
File "/env/local/lib/python3.7/site-
packages/google/cloud/functions/worker.py", line 184, in load_user_function
spec.loader.exec_module(main)
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/user_code/main.py", line 6, in <module>
import datalab.storage as gcs
File "/env/local/lib/python3.7/site-packages/datalab/storage/__init__.py",
line 16, in <module>
from ._bucket import Bucket, Buckets
File "/env/local/lib/python3.7/site-packages/datalab/storage/_bucket.py",
line 21, in <module>
import datalab.context
File "/env/local/lib/python3.7/site-packages/datalab/context/__init__.py",
line 15, in <module>
from ._context import Context
File "/env/local/lib/python3.7/site-packages/datalab/context/_context.py",
line 20, in <module>
from . import _project
File "/env/local/lib/python3.7/site-packages/datalab/context/_project.py",
line 18, in <module>
import datalab.utils
File "/env/local/lib/python3.7/site-packages/datalab/utils/__init__.py",
line 15
from ._async import async, async_function, async_method
^
SyntaxError: invalid syntax
What possibly is the error?
You are passing a string to blob.upload_from_file(), but this method requires a file object. You probably want to use blob.upload_from_filename() instead. Check the sample in the GCP docs.
Alternatively, you could get the file object, and keep using blob.upload_from_file(), but it's unnecessary extra lines.
with open('/tmp/abc.json', 'r') as file_obj:
upload_blob('test-bucket', file_obj, 'abc.json')
Use a bucket object instead of string
something like upload_blob(conn.get_bucket(mybucket),'/tmp/abc.json','abc.json')}

JSON Parsing with Nao robot - AttributeError

I'm using a NAO robot with naoqi version 2.1 and Choregraphe on Windows. I want to parse json from an attached file to the behavior. I attached the file like in that link.
Code:
def onLoad(self):
self.filepath = os.path.join(os.path.dirname(ALFrameManager.getBehaviorPath(self.behaviorId)), "fileName.json")
def onInput_onStart(self):
with open(self.filepath, "r") as f:
self.data = self.json.load(f.get_Response())
self.dataFromFile = self.data['value']
self.log("Data from file: " + str(self.dataFromFile))
But when I run this code on the robot (connected with a router) I'll get an error:
[ERROR] behavior.box :_safeCallOfUserMethod:281 _Behavior__lastUploadedChoregrapheBehaviorbehavior_1136151280__root__AbfrageKontostand_3__AuslesenJSONDatei_1: Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/naoqi.py", line 271, in _safeCallOfUserMethod
func()
File "<string>", line 20, in onInput_onStart
File "/usr/lib/python2.7/site-packages/inaoqi.py", line 265, in <lambda>
__getattr__ = lambda self, name: _swig_getattr(self, behavior, name)
File "/usr/lib/python2.7/site-packages/inaoqi.py", line 55, in _swig_getattr
raise AttributeError(name)
AttributeError: json
I already tried to understand the code from the correspondending lines but I couldn't fixed the error. But I know that the type of my object f is 'file'. How can I open the json file as a json file?
Your problem comes from this:
self.json.load(f.get_Response())
... there is no such thing as "self.json" on a Choregraphe box, import json and then do json.load. And what is get_Response ? That method doesn't exist on anything in Python that I know of.
You might want to first try making a standalone python script (that doesn't use the robot) that can read your json file before you try it with choregraphe. It will be easier.

connect to bucket having uppercase letter

I am not able to connect to a bucket if the bucket name has a Upper case letter.
I have several buckets those has capital letter in it.
>>> mybucket = conn.get_bucket('Vig_import')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/site-packages/boto/s3/connection.py", line 391, in get_bucket
bucket.get_all_keys(headers, maxkeys=0)
File "/usr/lib/python2.6/site-packages/boto/s3/bucket.py", line 360, in get_all_keys
'', headers, **params)
File "/usr/lib/python2.6/site-packages/boto/s3/bucket.py", line 317, in _get_all
query_args=s)
File "/usr/lib/python2.6/site-packages/boto/s3/connection.py", line 462, in make_request
host = self.calling_format.build_host(self.server_name(), bucket)
File "/usr/lib/python2.6/site-packages/boto/s3/connection.py", line 86, in build_host
return self.get_bucket_server(server, bucket)
File "/usr/lib/python2.6/site-packages/boto/s3/connection.py", line 65, in wrapper
if len(args) == 3 and check_lowercase_bucketname(args[2]):
File "/usr/lib/python2.6/site-packages/boto/s3/connection.py", line 57, in check_lowercase_bucketname
raise BotoClientError("Bucket names cannot contain upper-case " \
boto.exception.BotoClientError: BotoClientError: Bucket names cannot contain upper-case characters when using either the sub-domain or virtual hosting calling format.
S3 recommend that you only use DNS-compliant bucket names.
Take a look at the restrictions page: http://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html
However you do it, in boto you can use a different calling format for buckets with a mixed-case name:
from boto.s3.connection import OrdinaryCallingFormat
conn = boto.connect_s3(calling_format=OrdinaryCallingFormat())
mybucket = conn.get_bucket('Vig_import')
To addings those lines to .boto file are worked for me
[s3]
calling_format = boto.s3.connection.OrdinaryCallingFormat
Referance

Mercurial Push Error on Google Code Value Error

I am trying to learn to use mercurial by pushing onto Google code.
I have two .hgrc files: One file is located $PROJECT_DIR/.hg/.hrgc and $HOME/.hgrc. I have two separate files because I did not want to put the password out on the central repository.
Here is the content of $PROJECT_DIR/.hg/.hrgc:
[ui]
usermane=Venkat S. Rao <vrao423#gmail.com>
verbose=true
[paths]
default-push =https:vrao423:gc4yy3vB3mc4#//personal-site423.googlecode.com/hg/us
Here is the content of $HOME/.hgrc:
[ui]
username= Venkat Rao <vrao423#gmail.com>
verbose=True
[auth]
project.prefix=https://personal-site423.googlecode.com/hg/
password=###
username=vrao423
For username I have my Gmail id.
I can commit changes to my local repository, but when I try hg push I get this error.
** unknown exception encountered, details follow
** report bug details to http://mercurial.selenic.com/bts/
** or mercurial#selenic.com
** Mercurial Distributed SCM (version 1.4.3)
** Extensions loaded:
Traceback (most recent call last):
File "/usr/bin/hg", line 27, in
mercurial.dispatch.run()
File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 16, in run
sys.exit(dispatch(sys.argv[1:]))
File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 30, in dispatch
return _runcatch(u, args)
File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 46, in _runcatch
return _dispatch(ui, args)
File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 454, in _dispatch
return runcommand(lui, repo, cmd, fullargs, ui, options, d)
File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 324, in runcommand
ret = _runcommand(ui, options, cmd, d)
File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 505, in _runcommand
return checkargs()
File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 459, in checkargs
return cmdfunc()
File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 453, in
d = lambda: util.checksignature(func)(ui, *args, **cmdoptions)
File "/usr/lib/pymodules/python2.6/mercurial/util.py", line 386, in check
return func(*args, **kwargs)
File "/usr/lib/pymodules/python2.6/mercurial/commands.py", line 2345, in push
other = hg.repository(cmdutil.remoteui(repo, opts), dest)
File "/usr/lib/pymodules/python2.6/mercurial/hg.py", line 63, in repository
repo = _lookup(path).instance(ui, path, create)
File "/usr/lib/pymodules/python2.6/mercurial/httprepo.py", line 263, in instance
inst.between([(nullid, nullid)])
File "/usr/lib/pymodules/python2.6/mercurial/httprepo.py", line 184, in between
d = self.do_read("between", pairs=n)
File "/usr/lib/pymodules/python2.6/mercurial/httprepo.py", line 128, in do_read
fp = self.do_cmd(cmd, **args)
File "/usr/lib/pymodules/python2.6/mercurial/httprepo.py", line 80, in do_cmd
resp = self.urlopener.open(urllib2.Request(cu, data, headers))
File "/usr/lib/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib/pymodules/python2.6/mercurial/url.py", line 455, in https_open
self.auth = self.pwmgr.readauthtoken(req.get_full_url())
File "/usr/lib/pymodules/python2.6/mercurial/url.py", line 141, in readauthtoken
group, setting = key.split('.', 1)
ValueError: need more than 1 value to unpack
Please help me. I have tried reading the hgrc man but that is just gibberish.
Thank You
Venkat
I'm a Mercurial developer. Please report problems with our man page on the mailinglist or on our bugtracker. I would love to hear from you so that we can make the man page better, so please write to us and tell us which part you found to be "gibberish".
In this particular case, the problem is that you need to write your auth section like this:
[auth]
project.prefix=https://personal-site423.googlecode.com/hg/
project.password=###
project.username=vrao423
where I would replace project with googlecode or something similar. We should of course report something sensible instead of a traceback and I can see that we already fixed this particular bug in Mercurial 1.5.