Hazm: POSTagger(): ArgumentError: argument 2: <class 'TypeError'>: wrong type - nltk

I have got an error for running the below code. May you give me some help?
from __future__ import unicode_literals
from hazm import *
tagger = POSTagger(model='resources/postagger.model')
tagger.tag(word_tokenize('ما بسیار کتاب موانیم'))
Error:
---------------------------------------------------------------------------
ArgumentError Traceback (most recent call last)
<ipython-input-16-1d74d781e0c1> in <module>
1 tagger = POSTagger(model='resources/postagger.model')
----> 2 tagger = POSTagger()
3 tagger.tag(word_tokenize('ما بسیار کتاب موانیم'))
~/.local/lib/python3.6/site-packages/hazm/SequenceTagger.py in __init__(self, patterns, **options)
21 def __init__(self, patterns=[], **options):
22 from wapiti import Model
---> 23 self.model = Model(patterns='\n'.join(patterns), **options)
24
25 def train(self, sentences):
~/.local/lib/python3.6/site-packages/wapiti/api.py in __init__(self, patterns, encoding, **options)
283 self._model = _wapiti.api_new_model(
284 ctypes.pointer(self.options),
--> 285 self.patterns
286 )
287
ArgumentError: argument 2: <class 'TypeError'>: wrong type
I am using ubuntu18.04 on windows 10. Also, I put mentioned files in resources file beside of code.
Python 3.6.9
Package of hazm
I have no problem to run Chunker one from this packege!
chunker = Chunker(model='resources/chunker.model')
tagged = tagger.tag(word_tokenize('واقعا ک بعضیا چقد بی درکن و ادعا دارن فقط بنده خدا لابد دسترسی نداره ب دکتری چیزی نگران شد'))
tree2brackets(chunker.parse(tagged))

its because of wapiti package! wapiti does not supporting python 3 and just work with python 2! if you need postagger, you should use another postagger package!

Related

Type Error when trying to save model in tensorflow python, getting trace 'Unrecognized type <class 'tensorflow.python.framework.ops.EagerTensor'>.'

I'm relatively new to python and machine learning and I'm trying to classify chest X-ray scans and I created a model that does that. However, when I'm trying to save the model I'm getting the error:
TypeError: Unable to serialize [2.0896919 2.1128857 2.1081853] to JSON. Unrecognized type <class 'tensorflow.python.framework.ops.EagerTensor'>.
The full Stack Trace is:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_22092\3924096485.py in <module>
1 working_dir=os.getcwd()
2 subject='chest scans'
----> 3 save_model(subject, classes, img_size, f1score, working_dir)
~\AppData\Local\Temp\ipykernel_22092\541087647.py in save_model(subject, classes, img_size, f1score, working_dir)
3 save_id=f'{name}-{f1score:5.2f}.h5'
4 model_save_loc=os.path.join(working_dir, save_id)
----> 5 model.save(model_save_loc)
6 msg= f'model was saved as {model_save_loc}'
7 print_in_color(msg, (0,255,255), (100,100,100)) # cyan foreground
~\anaconda3\lib\site-packages\keras\utils\traceback_utils.py in error_handler(*args, **kwargs)
68 # To get the full stack trace, call:
69 # `tf.debugging.disable_traceback_filtering()`
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
~\anaconda3\lib\json\__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
232 if cls is None:
233 cls = JSONEncoder
--> 234 return cls(
235 skipkeys=skipkeys, ensure_ascii=ensure_ascii,
236 check_circular=check_circular, allow_nan=allow_nan, indent=indent,
~\anaconda3\lib\json\encoder.py in encode(self, o)
197 # exceptions aren't as detailed. The list call should be roughly
198 # equivalent to the PySequence_Fast that ''.join() would do.
--> 199 chunks = self.iterencode(o, _one_shot=True)
200 if not isinstance(chunks, (list, tuple)):
201 chunks = list(chunks)
~\anaconda3\lib\json\encoder.py in iterencode(self, o, _one_shot)
255 self.key_separator, self.item_separator, self.sort_keys,
256 self.skipkeys, _one_shot)
--> 257 return _iterencode(o, 0)
258
259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,
TypeError: Unable to serialize [2.0896919 2.1128857 2.1081853] to JSON. Unrecognized type <class 'tensorflow.python.framework.ops.EagerTensor'>.
I have created this function to save the model:
def save_model(subject, classes, img_size, f1score, working_dir):
name=subject + '-' + str(len(classes)) + '-(' + str(img_size[0]) + ' X ' + str(img_size[1]) + ')'
save_id=f'{name}-{f1score:5.2f}.h5'
model_save_loc=os.path.join(working_dir, save_id)
model.save(model_save_loc)
msg= f'model was saved as {model_save_loc}'
print_in_color(msg, (0,255,255), (100,100,100)) # cyan foreground
I have the following packages installed for tensorflow.
tensorboard 2.8.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorflow 2.8.1
tensorflow-estimator 2.8.0
tensorflow-io-gcs-filesystem 0.28.0
termcolor 1.1.0
keras 2.8.0
keras-nightly 2.5.0.dev2021032900
Keras-Preprocessing 1.1.2
Any help would be great! I am really not being able to convert the tensorflow EagerTensor to Json File for my ML model. Thanks!
I tried to save a ml model that I created and I failed and it's giving me an error when converting it to json file

Getting loading error while loading catboost in notebook

Unable to read catboost into jupyter notebook. Getting an import error.
ImportError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_27572/4101664180.py in
----> 1 import catboost as ctb
~\Anaconda3\Lib\site-packages\catboost_init_.py in
----> 1 from .core import (
2 FeaturesData, EFstrType, EShapCalcType, EFeaturesSelectionAlgorithm, Pool, CatBoost,
3 CatBoostClassifier, CatBoostRegressor, CatBoostRanker, CatBoostError, cv, train, sum_models, _have_equal_features,
4 to_regressor, to_classifier, to_ranker, MultiRegressionCustomMetric, MultiRegressionCustomObjective
5 ) # noqa
~\Anaconda3\Lib\site-packages\catboost\core.py in
41 _typeof = type
42
---> 43 from .plot_helpers import save_plot_file, try_plot_offline
44 from . import _catboost
45 from .metrics import BuiltinMetric
~\Anaconda3\Lib\site-packages\catboost\plot_helpers.py in
1 import warnings
2
----> 3 from . import _catboost
4 fspath = _catboost.fspath
5
ImportError: DLL load failed while importing _catboost: The specified module could not be found.

Error while loading a sentence transformer model

I'm trying to load transformer model from SentenceTransformer. Below is the code
# Now we create a SentenceTransformer model from scratch
word_emb = models.Transformer('paraphrase-mpnet-base-v2')
pooling = models.Pooling(word_emb.get_word_embedding_dimension())
model = SentenceTransformer(modules=[word_emb, pooling])
Below is the error
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_2948\3254427654.py in <module>
1 # Now we create a SentenceTransformer model from scratch
----> 2 word_emb = models.Transformer('paraphrase-mpnet-base-v2')
3 pooling = models.Pooling(word_emb.get_word_embedding_dimension())
4 model = SentenceTransformer(modules=[word_emb, pooling])
~\miniconda3\envs\atoti\lib\site-packages\sentence_transformers\models\Transformer.py in __init__(self, model_name_or_path, max_seq_length, model_args, cache_dir, tokenizer_args, do_lower_case, tokenizer_name_or_path)
27
28 config = AutoConfig.from_pretrained(model_name_or_path, **model_args, cache_dir=cache_dir)
---> 29 self._load_model(model_name_or_path, config, cache_dir)
30
31 self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name_or_path if tokenizer_name_or_path is not None else model_name_or_path, cache_dir=cache_dir, **tokenizer_args)
~\miniconda3\envs\atoti\lib\site-packages\sentence_transformers\models\Transformer.py in _load_model(self, model_name_or_path, config, cache_dir)
47 self._load_t5_model(model_name_or_path, config, cache_dir)
48 else:
---> 49 self.auto_model = AutoModel.from_pretrained(model_name_or_path, config=config, cache_dir=cache_dir)
50
51 def _load_t5_model(self, model_name_or_path, config, cache_dir):
~\miniconda3\envs\atoti\lib\site-packages\transformers\models\auto\auto_factory.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
445 elif type(config) in cls._model_mapping.keys():
446 model_class = _get_model_class(config, cls._model_mapping)
--> 447 return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
448 raise ValueError(
449 f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
~\miniconda3\envs\atoti\lib\site-packages\transformers\modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
1310 elif os.path.join(pretrained_model_name_or_path, FLAX_WEIGHTS_NAME):
1311 raise EnvironmentError(
-> 1312 f"Error no file named {WEIGHTS_NAME} found in directory {pretrained_model_name_or_path} but "
1313 "there is a file for Flax weights. Use `from_flax=True` to load this model from those "
1314 "weights."
OSError: Error no file named pytorch_model.bin found in directory paraphrase-mpnet-base-v2 but there is a file for Flax weights. Use `from_flax=True` to load this model from those weights.
I'm using below versions
transformers==4.16.2
torch==1.11.0+cu113
torchaudio==0.11.0+cu113
torchvision==0.12.0+cu113
sentence-transformers==2.2.0
faiss-cpu==1.7.2
sentencepiece==0.1.96
It's been 2 months i ran this. All of a sudden, it's returning an error. I'm using FAISS-CPU as well.
The error is telling you that "I can't find the weights of the model you are trying to load."
Based on the error trace, I guess you are using models object from Sentence-Transformers library (correct me if I am wrong). One thing to note is that Sentence-Transformers only has the following paraphrase models as its pretrained models:
paraphrase-multilingual-mpnet-base-v2
paraphrase-albert-small-v2
paraphrase-multilingual-MiniLM-L12-v2
paraphrase-MiniLM-L3-v2
hence the one you wanted to load is not one of Sentence-Transformers pretrained models.
That brings me to think that you are trying to load a model from your local machine.
I would suggest you to create a Sentence-Transformers model like this:
from sentence_transformers import SentenceTransformer
model_path_or_name = "path/to/model" # A folder that contains model config files, including pytorch_model.bin
model = SentenceTransformer(model_path_or_name)
There was also a possibility that the pytorch_model.bin file was downloaded with another filename, as mentioned in the SO thread here.
Let me know if this solves your problem. Cheers.

Error while setting up roBERTa model in colab notebook

I am getting error while merging vocabulary and merge txt files for tokenizers designed for Tensorflow roBERTa. I attached the error snapshot!![enter image description here][1]
Code:
tokenizer = tokenizers.ByteLevelBPETokenizer(vocab_file='vocab_roberta_base.json',
merges_file='merges_roberta_base.txt', lowercase=True,add_prefix_space=True)
ERROR:
Exception Traceback (most recent call last)
<ipython-input-9-5dab9f2389e4> in <module>()
1 MAX_LEN = 96
----> 2 tokenizer = tokenizers.ByteLevelBPETokenizer(vocab_file='vocab_roberta_base.json',merges_file='merges_roberta_base.txt')
3 sentiment_id = {'positive': 1313, 'negative': 2430, 'neutral': 7974}
/usr/local/lib/python3.6/dist-packages/tokenizers/implementations/byte_level_bpe.py in __init__(self, vocab_file, merges_file, add_prefix_space, lowercase, dropout, unicode_normalizer, continuing_subword_prefix, end_of_word_suffix)
31 dropout=dropout,
32 continuing_subword_prefix=continuing_subword_prefix or "",
---> 33 end_of_word_suffix=end_of_word_suffix or "",
34 )
35 )
Exception: expected ident at line 1 column 2

How to create a net which takes unlabeled "dummy data" as input?

I currently work myself through the caffe/examples/ to learn more about caffe/pycaffe.
In the 02-fine-tuning.ipynb-notebook there is a codecell which shows how to create a caffenet which takes unlabeled "dummmy data" as input, allowing us to set its input images externally. The notebook can be found here:
https://github.com/BVLC/caffe/blob/master/examples/02-fine-tuning.ipynb
There is a given code-cell, which throws an error:
dummy_data = L.DummyData(shape=dict(dim=[1, 3, 227, 227]))
imagenet_net_filename = caffenet(data=dummy_data, train=False)
imagenet_net = caffe.Net(imagenet_net_filename, weights, caffe.TEST)
error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-9f0ecb4d95e6> in <module>()
1 dummy_data = L.DummyData(shape=dict(dim=[1, 3, 227, 227]))
----> 2 imagenet_net_filename = caffenet(data=dummy_data, train=False)
3 imagenet_net = caffe.Net(imagenet_net_filename, weights, caffe.TEST)
<ipython-input-5-53badbea969e> in caffenet(data, label, train, num_classes, classifier_name, learn_all)
68 # write the net to a temporary file and return its filename
69 with tempfile.NamedTemporaryFile(delete=False) as f:
---> 70 f.write(str(n.to_proto()))
71 return f.name
~/anaconda3/envs/testcaffegpu/lib/python3.6/tempfile.py in func_wrapper(*args, **kwargs)
481 #_functools.wraps(func)
482 def func_wrapper(*args, **kwargs):
--> 483 return func(*args, **kwargs)
484 # Avoid closing the file as long as the wrapper is alive,
485 # see issue #18879.
TypeError: a bytes-like object is required, not 'str'
Anyone knows how to do this right?
tempfile.NamedTemporaryFile() opens a file in binary mode ('w+b') by default. Since you are using Python3.x, string is not the same type as for Python 2.x, hence providing a string as input to f.write() results in error since it expects bytes. Overriding the binary mode should avoid this error.
Replace
with tempfile.NamedTemporaryFile(delete=False) as f:
with
with tempfile.NamedTemporaryFile(delete=False, mode='w') as f:
This has been explained in a previous post:
TypeError: 'str' does not support the buffer interface