Upload Pandas dataframe as a JSON object in Cloud Storage

Upload Pandas dataframe as a JSON object in Cloud Storage - json

I have been trying to upload a Pandas dataframe to a JSON object in Cloud Storage using Cloud Function. Follwing is my code -
def upload_blob(bucket_name, source_file_name, destination_blob_name):
"""Uploads a file to the bucket."""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_file(source_file_name)
print('File {} uploaded to {}.'.format(
source_file_name,
destination_blob_name))
final_file = pd.concat([df, df_second], axis=0)
final_file.to_json('/tmp/abc.json')
with open('/tmp/abc.json', 'r') as file_obj:
upload_blob('test-bucket',file_obj,'abc.json')
I am getting the following error in line - blob.upload_from_file(source_file_name)
Deployment failure:
Function failed on loading user code. Error message: Code in file main.py
can't be loaded.
Detailed stack trace: Traceback (most recent call last):
File "/env/local/lib/python3.7/site-
packages/google/cloud/functions/worker.py", line 305, in
check_or_load_user_function
_function_handler.load_user_function()
File "/env/local/lib/python3.7/site-
packages/google/cloud/functions/worker.py", line 184, in load_user_function
spec.loader.exec_module(main)
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/user_code/main.py", line 6, in <module>
import datalab.storage as gcs
File "/env/local/lib/python3.7/site-packages/datalab/storage/__init__.py",
line 16, in <module>
from ._bucket import Bucket, Buckets
File "/env/local/lib/python3.7/site-packages/datalab/storage/_bucket.py",
line 21, in <module>
import datalab.context
File "/env/local/lib/python3.7/site-packages/datalab/context/__init__.py",
line 15, in <module>
from ._context import Context
File "/env/local/lib/python3.7/site-packages/datalab/context/_context.py",
line 20, in <module>
from . import _project
File "/env/local/lib/python3.7/site-packages/datalab/context/_project.py",
line 18, in <module>
import datalab.utils
File "/env/local/lib/python3.7/site-packages/datalab/utils/__init__.py",
line 15
from ._async import async, async_function, async_method
^
SyntaxError: invalid syntax
What possibly is the error?

You are passing a string to blob.upload_from_file(), but this method requires a file object. You probably want to use blob.upload_from_filename() instead. Check the sample in the GCP docs.
Alternatively, you could get the file object, and keep using blob.upload_from_file(), but it's unnecessary extra lines.
with open('/tmp/abc.json', 'r') as file_obj:
upload_blob('test-bucket', file_obj, 'abc.json')

Use a bucket object instead of string
something like upload_blob(conn.get_bucket(mybucket),'/tmp/abc.json','abc.json')}

Related

How to resolve coreferences without Internet using AllenNLP and coref-spanbert-large?

A want to resolve coreferences without Internet using AllenNLP and coref-spanbert-large model.
I try to do it in the way that is describing here https://demo.allennlp.org/coreference-resolution
My code:
from allennlp.predictors.predictor import Predictor
import allennlp_models.tagging
predictor = Predictor.from_path(r"C:\Users\aap\Desktop\coref-spanbert-large-2021.03.10.tar.gz")
example = 'Paul Allen was born on January 21, 1953, in Seattle, Washington, to Kenneth Sam Allen and Edna Faye Allen.Allen attended Lakeside School, a private school in Seattle, where he befriended Bill Gates, two years younger, with whom he shared an enthusiasm for computers.'
pred = predictor.predict(document=example)
coref_res = predictor.coref_resolved(example)
print(pred)
print(coref_res)
When I have an access to internet the code works correctly.
But when I don't have an access to internet I get the following errors:
Traceback (most recent call last):
File "C:/Users/aap/Desktop/CoreNLP/Coref_AllenNLP.py", line 14, in <module>
predictor = Predictor.from_path(r"C:\Users\aap\Desktop\coref-spanbert-large-2021.03.10.tar.gz")
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\predictors\predictor.py", line 361, in from_path
load_archive(archive_path, cuda_device=cuda_device, overrides=overrides),
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\models\archival.py", line 206, in load_archive
config.duplicate(), serialization_dir
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\models\archival.py", line 232, in _load_dataset_readers
dataset_reader_params, serialization_dir=serialization_dir
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 604, in from_params
**extras,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 632, in from_params
kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 200, in create_kwargs
cls.__name__, param_name, annotation, param.default, params, **extras
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 307, in pop_and_construct_arg
return construct_arg(class_name, name, popped_params, annotation, default, **extras)
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 391, in construct_arg
**extras,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 341, in construct_arg
return annotation.from_params(params=popped_params, **subextras)
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 604, in from_params
**extras,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 634, in from_params
return constructor_to_call(**kwargs) # type: ignore
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\data\token_indexers\pretrained_transformer_mismatched_indexer.py", line 63, in __init__
**kwargs,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\data\token_indexers\pretrained_transformer_indexer.py", line 58, in __init__
model_name, tokenizer_kwargs=tokenizer_kwargs
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\data\tokenizers\pretrained_transformer_tokenizer.py", line 71, in __init__
model_name, add_special_tokens=False, **tokenizer_kwargs
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\cached_transformers.py", line 110, in get_tokenizer
**kwargs,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 362, in from_pretrained
config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\models\auto\configuration_auto.py", line 368, in from_pretrained
config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\configuration_utils.py", line 424, in get_config_dict
use_auth_token=use_auth_token,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\file_utils.py", line 1087, in cached_path
local_files_only=local_files_only,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\file_utils.py", line 1268, in get_from_cache
"Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.
Process finished with exit code 1
Please, say me, what do I need to do my code works without Internet?

You will need a local copy of transformer model's configuration file and vocabulary so that the tokenizer and token indexer don't need to download those:
from transformers import AutoConfig, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(transformer_model_name)
config = AutoConfig.from_pretrained(transformer_model_name)
tokenizer.save_pretrained(local_config_path)
config.to_json_file(local_config_path + "/config.json")
You will then need to override the transformer model name in the configuration file to the local directory (local_config_path) where you saved these things:
predictor = Predictor.from_path(
r"C:\Users\aap\Desktop\coref-spanbert-large-2021.03.10.tar.gz",
overrides={
"dataset_reader.token_indexers.tokens.model_name": local_config_path,
"validation_dataset_reader.token_indexers.tokens.model_name": local_config_path,
"model.text_field_embedder.tokens.model_name": local_config_path,
},
)

I have run into similar problem when using structured-prediction-srl-bert without internet, and I saw in the logs 4 item for downloads:
dataset_reader.bert_model_name = bert-base-uncased, Downloading 4 files
model INFO vocabulary.py - Loading token dictionary from data/structured-prediction-srl-bert.2020.12.15/vocabulary. Downloading... 4x smaller files
Spacy models 'en_core_web_sm' not found
later on, [nltk_data] Error loading punkt: <urlopen error [Errno -3] Temporary failure in name resolution> [nltk_data] Error loading wordnet: <urlopen error [Errno -3] Temporary failure in name resolution>
I have solved it with these steps:
structured-prediction-srl-bert:
I have downloaded the structured-prediction-srl-bert.2020.12.15.tar.gz from the https://demo.allennlp.org/semantic-role-labeling (Model Card tab) -
https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz
I have unzipped it into ./data/structured-prediction-srl-bert.2020.12.15
The code:
pip install allennlp==2.10.0 allennlp-models==2.10.0
from allennlp.predictors.predictor import Predictor
predictor = Predictor.from_path("./data/structured-prediction-srl-bert.2020.12.15/")
bert-base-uncased
I have created a folder ./data/bert-base-uncased and there I have downloaded these files from https://huggingface.co/bert-base-uncased/tree/main
config.json
tokenizer.json
tokenizer_config.json
vocab.txt
pytorch_model.bin
Aditionally, I had to change the "bert_model_name" from "bert-base-uncased" into a path "./data/bert-base-uncased", the earlier causes the download. This has to be done in the ./data/structured-prediction-srl-bert.2020.12.15/config.json , and there are two occurences.
python -m spacy download en_core_web_sm
python -c 'import nltk; nltk.download("punkt"); nltk.download("wordnet")'
After these steps the allennlp did not need internet anymore.

Getting this error with py2.7 as well as with py3.7

Getting this error with py2.7 as well as with py3.7
enter code here
Exception happened during processing of request from ('10.0.2.15', 41994)
Traceback (most recent call last):
File "/usr/lib/python3.8/socketserver.py", line 650, in process_request_thread
self.finish_request(request, client_address)
File "/usr/lib/python3.8/socketserver.py", line 360, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python3.8/socketserver.py", line 720, in __init__
self.handle()
File "/usr/lib/python3.8/http/server.py", line 427, in handle
self.handle_one_request()
File "/usr/lib/python3.8/http/server.py", line 415, in handle_one_request
method()
File "/usr/share/set/src/webattack/harvester/harvester.py", line 334, in do_POST
filewrite.write(cgi.escape("PARAM: " + line + "\n"))
AttributeError: module 'cgi' has no attribute 'escape'

I think, you need to add import html under import cgi and then change cgi.escape to html.escape. You need to do that in /usr/share/set/src/webattack/harvester/harvester.py (for details you can check this link - https://github.com/trustedsec/social-engineer-toolkit/issues/721)

import html
html.escape(string_).encode('ascii', 'xmlcharrefreplace')

Issue with KeyError: 'babel'

I am very new to Flask and everything related to Web development. I am building an app in Flask with Dash integrated and it is failing with the following error:
C:\Users\satpute\PycharmProjects\RMAPartsDepotPlanning\venv\Scripts\python.exe
C:/PycharmProjects/RMAPrototype/dashapp.py
Traceback (most recent call last):
File "C:\PycharmProjects\RMAPrototype\dashapp.py", line 4, in <module>
app = create_app()
File "C:\PycharmProjects\RMAPrototype\PDP\__init__.py", line 12, in create_app
from PDP import PDPApp
File "C:\PycharmProjects\RMAPrototype\PDP\PartsDepotPlanningApp.py", line 14, in <module>
from flask_table import Table, Col, LinkCol
File "C:\PycharmProjects\RMAPrototype\venv\lib\site-packages\flask_table\__init__.py", line 1, in
<module>
from .table import Table, create_table
File "C:\PycharmProjects\RMAPrototype\venv\lib\site-packages\flask_table\table.py", line 8, in
<module>
from .columns import Col
File "C:\PycharmProjects\RMAPrototype\venv\lib\site-packages\flask_table\columns.py", line 161, in
<module>
class BoolCol(OptCol):
File "C:\PycharmProjects\RMAPrototype\venv\lib\site-packages\flask_table\columns.py", line 166, in
BoolCol
yes_display = _('Yes')
File "C:\PycharmProjects\RMAPrototype\venv\lib\site-packages\flask_babel\__init__.py", line 548, in
gettext
t = get_translations()
File "C:\PycharmProjects\RMAPrototype\venv\lib\site-packages\flask_babel\__init__.py", line 217,
in get_translations
babel = current_app.extensions['babel']
KeyError: 'babel'
> Process finished with exit code 1
How can I go about troubleshooting this? I tried different approaches but couldn't resolve it so far.

Using json with jsondatetime

json.dumps gives an error if both json and jsondatetime are imported. The error is the following:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/json/__init__.py", line 230, in dumps
return _default_encoder.encode(obj)
TypeError: encode() missing 1 required positional argument: 'o'
But I just import JSON, then json.dumps work fine. I don't know how to deal with this. I need jsondatetime as well
This works::
import json
json.dumps({'DbName': 'DB','Hostname': '10.0.0.6','DbUsername':'SYSTEM'})
'{"Hostname": "10.0.0.6","DbName": "DB", "DbUsername": "SYSTEM"}'
This does not work::
import jsondatetime
import json
json.dumps({'DbName': 'DB', 'Hostname': '10.0.0.6', 'DbUsername': 'SYSTEM'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/json/__init__.py", line 230, in dumps
return _default_encoder.encode(obj)
TypeError: encode() missing 1 required positional argument: 'o'

jsondatetime is a drop-in replacement for json. You should only have
import jsondatetime as json
From the documentation:
JSON-datetime is a very simple wrapper around Python simplejson loads method. It decodes datetime values contained in JSON strings.

Cuda library dead after linux-updates

System ran beautifully, until I received update notifications from Ubuntu. So I accepted. And after they ran I get a big Cuda-issue:
('fp: ', <open file '/usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow.so', mode 'rb' at 0x7f8af1a63300>)
('pathname: ', '/usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow.so')
('description: ', ('.so', 'rb', 3))
Traceback (most recent call last):
File "translate.py", line 41, in <module>
import tensorflow.python.platform
File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 23, in <module>
from tensorflow.python import *
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 45, in <module>
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 31, in <module>
_pywrap_tensorflow = swig_import_helper()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 27, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
ImportError: libcudart.so.7.5: cannot open shared object file: No such file or directory
Any idea?
thx

It seems like your system cannot find "libcudart.so.7.5".
libcudart.so.7.5: cannot open shared object file: No such file or directory
Could you check this file exist and you set the PATH/LD_LIBRARY_PATH correctly?
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Upload Pandas dataframe as a JSON object in Cloud Storage - json

Use a bucket object instead of string something like upload_blob(conn.get_bucket(mybucket),'/tmp/abc.json','abc.json')}

Related

How to resolve coreferences without Internet using AllenNLP and coref-spanbert-large?

Getting this error with py2.7 as well as with py3.7

Issue with KeyError: 'babel'

Using json with jsondatetime

Cuda library dead after linux-updates

Categories

Resources