Running Stanford POS tagger in NLTK leads to "not a valid Win32 application" on Windows - nltk

I am trying to use stanford POS tagger in NLTK by the following code:
import nltk
from nltk.tag.stanford import POSTagger
st = POSTagger('E:\Assistant\models\english-bidirectional-distsim.tagger',
'E:\Assistant\stanford-postagger.jar')
st.tag('What is the airspeed of an unladen swallow?'.split())
and here is the output:
Traceback (most recent call last):
File "E:\J2EE\eclipse\WSNLP\nlp\src\tagger.py", line 5, in <module>
st.tag('What is the airspeed of an unladen swallow?'.split())
File "C:\Python34\lib\site-packages\nltk\tag\stanford.py", line 59, in tag
return self.tag_sents([tokens])[0]
File "C:\Python34\lib\site-packages\nltk\tag\stanford.py", line 81, in tag_sents
stdout=PIPE, stderr=PIPE)
File "C:\Python34\lib\site-packages\nltk\internals.py", line 153, in java
p = subprocess.Popen(cmd, stdin=stdin, stdout=stdout, stderr=stderr)
File "C:\Python34\lib\subprocess.py", line 858, in __init__
restore_signals, start_new_session)
File "C:\Python34\lib\subprocess.py", line 1111, in _execute_child
startupinfo)
OSError: [WinError 193] %1 is not a valid Win32 application
P.S. My java home is set and I have no problem with my java installation. Can someone explain what this error is talking about? It is not informative for me. Thanks in advance.

Looks like your Java installation is botched or missing.

It worked after a lot of trial and error:
It seems that NLTK Internal cannot find the java binary automatically on windows, so we need to identify it as follows:
import os
import nltk
from nltk.tag.stanford import POSTagger
os.environ['JAVA_HOME'] = r'C:\Program Files\Java\jre6\bin'
st = POSTagger('E:\stanford-postagger-2014-10-26\models\english-left3words-distsim.tagger',
'E:\stanford-postagger-2014-10-26\stanford-postagger.jar')
st.tag(nltk.word_tokenize('What is the airspeed of an unladen swallow?'))
As one of the gurus said to me: "don't forget to add "r" while working with "\" in strings."

Related

How to resolve coreferences without Internet using AllenNLP and coref-spanbert-large?

A want to resolve coreferences without Internet using AllenNLP and coref-spanbert-large model.
I try to do it in the way that is describing here https://demo.allennlp.org/coreference-resolution
My code:
from allennlp.predictors.predictor import Predictor
import allennlp_models.tagging
predictor = Predictor.from_path(r"C:\Users\aap\Desktop\coref-spanbert-large-2021.03.10.tar.gz")
example = 'Paul Allen was born on January 21, 1953, in Seattle, Washington, to Kenneth Sam Allen and Edna Faye Allen.Allen attended Lakeside School, a private school in Seattle, where he befriended Bill Gates, two years younger, with whom he shared an enthusiasm for computers.'
pred = predictor.predict(document=example)
coref_res = predictor.coref_resolved(example)
print(pred)
print(coref_res)
When I have an access to internet the code works correctly.
But when I don't have an access to internet I get the following errors:
Traceback (most recent call last):
File "C:/Users/aap/Desktop/CoreNLP/Coref_AllenNLP.py", line 14, in <module>
predictor = Predictor.from_path(r"C:\Users\aap\Desktop\coref-spanbert-large-2021.03.10.tar.gz")
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\predictors\predictor.py", line 361, in from_path
load_archive(archive_path, cuda_device=cuda_device, overrides=overrides),
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\models\archival.py", line 206, in load_archive
config.duplicate(), serialization_dir
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\models\archival.py", line 232, in _load_dataset_readers
dataset_reader_params, serialization_dir=serialization_dir
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 604, in from_params
**extras,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 632, in from_params
kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 200, in create_kwargs
cls.__name__, param_name, annotation, param.default, params, **extras
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 307, in pop_and_construct_arg
return construct_arg(class_name, name, popped_params, annotation, default, **extras)
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 391, in construct_arg
**extras,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 341, in construct_arg
return annotation.from_params(params=popped_params, **subextras)
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 604, in from_params
**extras,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 634, in from_params
return constructor_to_call(**kwargs) # type: ignore
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\data\token_indexers\pretrained_transformer_mismatched_indexer.py", line 63, in __init__
**kwargs,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\data\token_indexers\pretrained_transformer_indexer.py", line 58, in __init__
model_name, tokenizer_kwargs=tokenizer_kwargs
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\data\tokenizers\pretrained_transformer_tokenizer.py", line 71, in __init__
model_name, add_special_tokens=False, **tokenizer_kwargs
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\cached_transformers.py", line 110, in get_tokenizer
**kwargs,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 362, in from_pretrained
config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\models\auto\configuration_auto.py", line 368, in from_pretrained
config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\configuration_utils.py", line 424, in get_config_dict
use_auth_token=use_auth_token,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\file_utils.py", line 1087, in cached_path
local_files_only=local_files_only,
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\file_utils.py", line 1268, in get_from_cache
"Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.
Process finished with exit code 1
Please, say me, what do I need to do my code works without Internet?
You will need a local copy of transformer model's configuration file and vocabulary so that the tokenizer and token indexer don't need to download those:
from transformers import AutoConfig, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(transformer_model_name)
config = AutoConfig.from_pretrained(transformer_model_name)
tokenizer.save_pretrained(local_config_path)
config.to_json_file(local_config_path + "/config.json")
You will then need to override the transformer model name in the configuration file to the local directory (local_config_path) where you saved these things:
predictor = Predictor.from_path(
r"C:\Users\aap\Desktop\coref-spanbert-large-2021.03.10.tar.gz",
overrides={
"dataset_reader.token_indexers.tokens.model_name": local_config_path,
"validation_dataset_reader.token_indexers.tokens.model_name": local_config_path,
"model.text_field_embedder.tokens.model_name": local_config_path,
},
)
I have run into similar problem when using structured-prediction-srl-bert without internet, and I saw in the logs 4 item for downloads:
dataset_reader.bert_model_name = bert-base-uncased, Downloading 4 files
model INFO vocabulary.py - Loading token dictionary from data/structured-prediction-srl-bert.2020.12.15/vocabulary. Downloading... 4x smaller files
Spacy models 'en_core_web_sm' not found
later on, [nltk_data] Error loading punkt: <urlopen error [Errno -3] Temporary failure in name resolution> [nltk_data] Error loading wordnet: <urlopen error [Errno -3] Temporary failure in name resolution>
I have solved it with these steps:
structured-prediction-srl-bert:
I have downloaded the structured-prediction-srl-bert.2020.12.15.tar.gz from the https://demo.allennlp.org/semantic-role-labeling (Model Card tab) -
https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz
I have unzipped it into ./data/structured-prediction-srl-bert.2020.12.15
The code:
pip install allennlp==2.10.0 allennlp-models==2.10.0
from allennlp.predictors.predictor import Predictor
predictor = Predictor.from_path("./data/structured-prediction-srl-bert.2020.12.15/")
bert-base-uncased
I have created a folder ./data/bert-base-uncased and there I have downloaded these files from https://huggingface.co/bert-base-uncased/tree/main
config.json
tokenizer.json
tokenizer_config.json
vocab.txt
pytorch_model.bin
Aditionally, I had to change the "bert_model_name" from "bert-base-uncased" into a path "./data/bert-base-uncased", the earlier causes the download. This has to be done in the ./data/structured-prediction-srl-bert.2020.12.15/config.json , and there are two occurences.
python -m spacy download en_core_web_sm
python -c 'import nltk; nltk.download("punkt"); nltk.download("wordnet")'
After these steps the allennlp did not need internet anymore.

Has PackageLoader changed with Jinja2 (3.0.1) and Python3 (3.9.5)?

I'm using Jinja2 (3.0.1), Python3 (3.9.5), and macOS (11.3.1).
These lines used to work:
from jinja2 import Environment, PackageLoader
e = Environment(loader = PackageLoader("__main__", "."))
but now produce:
Traceback (most recent call last):
File "/Users/downing/Dropbox/jinja2/Jinja2.py", line 36, in <module>
main()
File "/Users/downing/Dropbox/jinja2/Jinja2.py", line 31, in main
e = Environment(loader = PackageLoader("__main__", "."))
File "/usr/local/lib/python3.9/site-packages/jinja2/loaders.py", line 286, in __init__
spec = importlib.util.find_spec(package_name)
File "/usr/local/Cellar/python#3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/importlib/util.py", line 114, in find_spec
raise ValueError('{}.__spec__ is None'.format(name))
ValueError: __main__.__spec__ is None
Just discovered that FileSystemLoader still works and looks more appropriate for my use anyway.
The PackageLoader docs say:
Changed in version 3.0: No longer uses setuptools as a dependency.
which seems significant.
__main__.__spec__ can be None if __main__ isn't imported from a module. Probably python -m Jinja2 would still work for you, even if python Jinja2.py doesn't.
Like you, I switched to FileSystemLoader for this case, but I added a conditional to continue using PackageLoader when possible. That allows python -m path.to.my.module to continue working if the module is installed in a ZIP file or egg:
if __spec__ is not None:
loader = jinja2.PackageLoader('__main__')
else:
loader = jinja2.FileSystemLoader(
os.path.join(os.path.dirname(__file__), 'templates')
)
env = jinja2.Environment(loader=loader)

error while using nltk.post_tag

I have been trying to use nltk.pos_tag in my code but I face an error when I do so. I have already downloaded Penn treebank and max_ent_treebank_pos. But the error persists. here is my code :
import nltk
from nltk import tag
from nltk import*
a = "Alan Shearer is the first player to score over a hundred Premier League goals."
a_sentences = nltk.sent_tokenize(a)
a_words = [nltk.word_tokenize(sentence) for sentence in a_sentences]
a_pos = [nltk.pos_tag(sentence) for sentence in a_words]
print(a_pos)
and this is the error I get :
"Traceback (most recent call last):
File "<pyshell#9>", line 1, in <module>
print (nltk.pos_tag(text))
File "C:\Python34\lib\site-packages\nltk\tag\__init__.py", line 110, in pos_tag
tagger = PerceptronTagger()
File "C:\Python34\lib\site-packages\nltk\tag\perceptron.py", line 140, in __init__
AP_MODEL_LOC = 'file:'+str(find('taggers/averaged_perceptron_tagger/'+PICKLE))
File "C:\Python34\lib\site-packages\nltk\data.py", line 641, in find
raise LookupError(resource_not_found)
LookupError:
Resource 'taggers/averaged_perceptron_tagger/averaged_perceptron
_tagger.pickle' not found. Please use the NLTK Downloader to
obtain the resource: >>> nltk.download()
Searched in:
- 'C:\\Users\\T01142/nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
- 'C:\\Python34\\nltk_data'
- 'C:\\Python34\\lib\\nltk_data'
- 'C:\\Users\\T01142\\AppData\\Roaming\\nltk_data'
Call this from python:
nltk.download('averaged_perceptron_tagger')
Had the same problem in a Flask server. nltk used a different path when in server config, so I recurred to adding nltk.data.path.append("/home/yourusername/whateverpath/") inside of the server code right before the pos_tag call
Note there is some replication of this question
How to config nltk data directory from code?
nltk doesn't add $NLTK_DATA to search path?
POS tagging with NLTK. Can't locate averaged_perceptron_tagger
To resolve this error run following command on python prompt:
import nltk
nltk.download('averaged_perceptron_tagger')

Cython: Trying to wrap SFML Window; getting "ImportError: No module named 'ExprNodes'"

sfml.pxd:
cdef extern from "SFML/Window.hpp" namespace "sf":
cdef cppclass VideoMode:
VideoMode(unsigned int, unsigned int) except +
cdef cppclass Window:
Window(VideoMode, String) except +
void display()
display.pyx:
cimport sfml
cdef class Window:
cdef sfml.Window* _this
def __cinit__(self, unsigned int width, unsigned int height):
self._this = new sfml.Window(sfml.VideoMode(width, height), "title")
def __dealloc__(self):
del self._this
def display(self):
self._this.display()
setup.py:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
setup(
cmdclass = {'build_ext': build_ext},
ext_modules = [
Extension("display", ["display.pyx"],
language='c++',
libraries=["sfml-system", "sfml-window"])
]
)
The error when running python setup.py build:
running build
running build_ext
cythoning display.pyx to display.cpp
Traceback (most recent call last):
File "setup.py", line 10, in <module>
libraries=["sfml-system", "sfml-window"])
File "/usr/lib/python3.3/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/lib/python3.3/distutils/dist.py", line 917, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.3/distutils/dist.py", line 936, in run_command
cmd_obj.run()
File "/usr/lib/python3.3/distutils/command/build.py", line 126, in run
self.run_command(cmd_name)
File "/usr/lib/python3.3/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.3/distutils/dist.py", line 936, in run_command
cmd_obj.run()
File "/usr/lib/python3.3/site-packages/Cython/Distutils/build_ext.py", line 163, in run
_build_ext.build_ext.run(self)
File "/usr/lib/python3.3/distutils/command/build_ext.py", line 354, in run
self.build_extensions()
File "/usr/lib/python3.3/site-packages/Cython/Distutils/build_ext.py", line 170, in build_extensions
ext.sources = self.cython_sources(ext.sources, ext)
File "/usr/lib/python3.3/site-packages/Cython/Distutils/build_ext.py", line 317, in cython_sources
full_module_name=module_name)
File "/usr/lib/python3.3/site-packages/Cython/Compiler/Main.py", line 608, in compile
return compile_single(source, options, full_module_name)
File "/usr/lib/python3.3/site-packages/Cython/Compiler/Main.py", line 549, in compile_single
return run_pipeline(source, options, full_module_name)
File "/usr/lib/python3.3/site-packages/Cython/Compiler/Main.py", line 386, in run_pipeline
from . import Pipeline
File "/usr/lib/python3.3/site-packages/Cython/Compiler/Pipeline.py", line 7, in <module>
from .Visitor import CythonTransform
File "Visitor.py", line 10, in init Cython.Compiler.Visitor (/build/src/Cython-0.19/Cython/Compiler/Visitor.c:15987)
ImportError: No module named 'ExprNodes'
Apparently, it can't find something called 'ExprNodes', but I don't think that my Cython installation is broken, because I managed to successfully wrap a different C++ library some time ago, and I didn't run into this problem.
I'm using Cython 0.19.
I would appreciate any help/insight that you could offer.
Thanks.
Looking more closely at the traceback, I see that Cython fails inside it's own compiled code. It may be a bug indeed, sorry for missing it the first time.
What can you do:
Create a clean virtualenv, install Cython there and check if it works. (Version 0.19.1 is the latest).
Create another virtualenv, but this time install Cython using python setup.py install --no-cython-compile.
If either of these fails, please post your detailed configuration (linux distro and version, python version, gcc version, etc.) to the cython-devel mailing list.
BTW does your old successful project still compile?

Loading CPP functions into Python using Ctypes

I have a question to Ctypes and do not know what I do wrong. And yes, I am a newbie for Python and searched the other posts in here. So any advice is appreciated.
What do I want to do :
I simply want to load the FXCM C++ APP fundtions into Python 3.3 so I can call them for connecting to their server.
As it seems Ctypes seems to be the best tool. So a simple code in Python :
import os
dirlist = os.listdir('ForexConnectAPIx64/bin')
from pprint import pprint
pprint(dirlist)
from ctypes import *
myDll = cdll.LoadLibrary ("ForexConnectAPIx64/bin/ForexConnect.dll")
gives a result :
Traceback (most recent call File "C:\Users\scaberia3\Python_Projects \FTC\ListDir_Test.py", line 20, in <module>
myDll = cdll.LoadLibrary ("ForexConnectAPIx64/bin/ForexConnect.dll")
File "C:\Python33\lib\ctypes\__init__.py", line 431, in LoadLibrary
return self._dlltype(name)
File "C:\Python33\lib\ctypes\__init__.py", line 353, in __init__
self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] Das angegebene Modul wurde nicht gefunden (Module not found)
['ForexConnect.dll',
'fxmsg.dll',
'gsexpat.dll',
'gslibeay32.dll',
'gsssleay32.dll',
'gstool2.dll',
'gszlib.dll',
'java',
'log4cplus.dll',
'msvcp80.dll',
'msvcr80.dll',
'net',
'pdas.dll']
Means the path is correct ForextConnect.dll is present and I might do some very simple wrong, but have no clue what.
You can either use Dependency Walker to figure out the correct sequence to manually load the DLLs, or simply add the directory to the system search path:
dllpath = os.path.abspath('ForexConnectAPIx64/bin')
os.environ['PATH'] += os.pathsep + dllpath
myDLL = CDLL('ForexConnect.dll')