I get an error message when I try FreqDist() in NLTK -- NameError: name 'nltk' is not defined - nltk

I'm learning about the NLTK and my mac
is working fine except I have trouble with the FreqDist(). (I saw another question about FreqDist() but he was getting a different error message. TypeError: unhashable type: 'list')
Here's an example:
>>> from nltk.corpus import brown
>>> news_text = brown.words(categories='news')
>>> fdist = nltk.FreqDist([w.lower() for w in news_text])
Traceback (most recent call last):
` File "<stdin>", line 1, in <module>`
`NameError: name 'nltk' is not defined`
This error message is pretty consistent. I get this message every time I try the FreqDist(). Other commands like - >>> brown.fileids() are fine.
Thanks for your help!

Before you can use FreqDist, you need to import it.
Add a line as follows:
import nltk
or if you just want to use FreqDist you should try this:
>>> from nltk.corpus import brown
>>> from nltk import FreqDist
>>> news_text = brown.words(categories='news')
>>> fdist = FreqDist([w.lower() for w in news_text])

which means you haven't installed nltk.
follow these steps to install nltk:
1:go to this link https://pypi.python.org/pypi/setuptools at the end of page you find setuptools-7.0.zip (md5) download it, then unzip it. you can find easy_install.py python script.
2:use the command sudo easy_install pip. By this time pip will be installed ready to use, (make sure you are in the directory where you can find easy_install script file).
3:use this command sudo pip install -U nltk. successful execution ensure that nltk is now installed.
4:open the IDLE then you type the following:
import nltk
if nltk is installed properly then you will be returned with console.

setuptools are required for older versions of Python. There is no need for the same if you are running 3.2+
You can easily download the same from https://pypi.python.org/pypi/nltk
For more information on http://www.nltk.org/install.html

nltk requires data you need to download first.
Then run the following code:
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
stopwords.words("english")

Related

ModuleNotFoundError: No module named 'torchvision.models.feature_extraction'

I want to extract features in ResNet101, however, I have trouble importing torchvision.models.feature_extraction.
Here is my code:
from torchvision import models
from torchvision.models.feature_extractor import create_feature_extractor
res101 = models.resnet101(pretrained=True)
extractor = create_feature_extractor(
res101,
return_nodes=[
"conv1",
"maxpool",
"layer1",
"layer2",
"layer3",
"layer4",
]
)
features = extractor(inputs)
And here is the error
from torchvision.models.feature_extractor import create_feature_extractor
Traceback (most recent call last):
Input In [11] in <cell line: 1>
from torchvision.models.feature_extractor import create_feature_extractor
ModuleNotFoundError: No module named 'torchvision.models.feature_extractor'
You might be trying to use something like:
from torchvision.models.feature_extraction import create_feature_extractor
See the extraction vs extractor
Check this module
Same problem. I installed PyTorch using conda and it works fine in Jupyter notebooks. But it does not work in terminal.
Turns out the pip listed torchvision version was 0.82.
Solved by updating torchvision using pip.
Maybe some packages installed the old version for me. Hope my experience helps you.

ModuleNotFoundError: No module named 'fastai.vision'

I am trying to use ImageDataBunch from fastai, and it worked fine, but recently when I ran my code, it showed this error ModuleNotFoundError: No module named 'fastai.vision'
Then, I upgraded my fastai version pip install fastai --upgrade. This error got cleared but landed in NameError: name 'ImageDataBunch' is not defined
Here's my code:
import warnings
import numpy as np
from fastai.vision import *
warnings.filterwarnings("ignore", category=UserWarning, module="torch.nn.functional")
np.random.seed(42)
data = ImageDataBunch.from_folder(path, train='.', valid_pct=0.2,
ds_tfms=get_transforms(), size=224, num_workers=4, no_check=True).normalize(imagenet_stats)
How can I fix this?
I actually ran into this same issue when I started using Colab, but haven't been able to reproduce it. Here was the thread describing what I and another developer did to troubleshoot: https://forums.fast.ai/t/no-module-named-fastai-data-in-google-colab/78164/4
I would recommend trying to factory reset your runtime ( "Runtime" -> "Factory Reset Runtime")
Then you can check which version of fastai you have (you have to restart the runtime to use the new version if you've already imported it)
import fastai
fastai.__version__
I'm able to run fastai.vision import * on fastai version 1.0.61 and 2.0.13
In Google Colab:
Upgrade fastai on colab:
! [ -e /content ] && pip install -Uqq fastai
Import necessary libraries:
from fastai.vision.all import *
from fastai.text.all import *
from fastai.collab import *
from fastai.tabular.all import *
Get the images and annotations:
path = untar_data(URLs.PETS)
path_anno = path/'annotations'
path_img = path/'images'
print( path_img.ls() ) # print all images
fnames = get_image_files(path_img) # -->> 7390 images
print(fnames[:5]) # print first 5 images
The solution that worked for me is to copy to (connect) my google drive & then run the cells. Source
You might have installed the older version of fastai. You need to upgrade to fastaiv2. You can upgrade fastai by using pip as shown below.
!pip install fastai --upgrade
Also check your fastai version using
import fastai
print(fastai.__version__)

How to install Punkt Sentence Tokenizer [duplicate]

Updated answer:NLTK works for 2.7 well. I had 3.2. I uninstalled 3.2 and installed 2.7. Now it works!!
I have installed NLTK and tried to download NLTK Data. What I did was to follow the instrution on this site: http://www.nltk.org/data.html
I downloaded NLTK, installed it, and then tried to run the following code:
>>> import nltk
>>> nltk.download()
It gave me the error message like below:
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
nltk.download()
AttributeError: 'module' object has no attribute 'download'
Directory of C:\Python32\Lib\site-packages
Tried both nltk.download() and nltk.downloader(), both gave me error messages.
Then I used help(nltk) to pull out the package, it shows the following info:
NAME
nltk
PACKAGE CONTENTS
align
app (package)
book
ccg (package)
chat (package)
chunk (package)
classify (package)
cluster (package)
collocations
corpus (package)
data
decorators
downloader
draw (package)
examples (package)
featstruct
grammar
help
inference (package)
internals
lazyimport
metrics (package)
misc (package)
model (package)
parse (package)
probability
sem (package)
sourcedstring
stem (package)
tag (package)
test (package)
text
tokenize (package)
toolbox
tree
treetransforms
util
yamltags
FILE
c:\python32\lib\site-packages\nltk
I do see Downloader there, not sure why it does not work. Python 3.2.2, system Windows vista.
TL;DR
To download a particular dataset/models, use the nltk.download() function, e.g. if you are looking to download the punkt sentence tokenizer, use:
$ python3
>>> import nltk
>>> nltk.download('punkt')
If you're unsure of which data/model you need, you can start out with the basic list of data + models with:
>>> import nltk
>>> nltk.download('popular')
It will download a list of "popular" resources, these includes:
<collection id="popular" name="Popular packages">
<item ref="cmudict" />
<item ref="gazetteers" />
<item ref="genesis" />
<item ref="gutenberg" />
<item ref="inaugural" />
<item ref="movie_reviews" />
<item ref="names" />
<item ref="shakespeare" />
<item ref="stopwords" />
<item ref="treebank" />
<item ref="twitter_samples" />
<item ref="omw" />
<item ref="wordnet" />
<item ref="wordnet_ic" />
<item ref="words" />
<item ref="maxent_ne_chunker" />
<item ref="punkt" />
<item ref="snowball_data" />
<item ref="averaged_perceptron_tagger" />
</collection>
EDITED
In case anyone is avoiding errors from downloading larger datasets from nltk, from https://stackoverflow.com/a/38135306/610569
$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip
$ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite
$ python
>>> import nltk
>>> dler = nltk.downloader.Downloader()
>>> dler._update_index()
>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
>>> dler.download('popular')
Updated
From v3.2.5, NLTK has a more informative error message when nltk_data resource is not found, e.g.:
>>> from nltk import word_tokenize
>>> word_tokenize('x')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/l/alvas/git/nltk/nltk/tokenize/__init__.py", line 128, in word_tokenize
sentences = [text] if preserve_line else sent_tokenize(text, language)
File "/Users//alvas/git/nltk/nltk/tokenize/__init__.py", line 94, in sent_tokenize
tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
File "/Users/alvas/git/nltk/nltk/data.py", line 820, in load
opened_resource = _open(resource_url)
File "/Users/alvas/git/nltk/nltk/data.py", line 938, in _open
return find(path_, path + ['']).open()
File "/Users/alvas/git/nltk/nltk/data.py", line 659, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt')
Searched in:
- '/Users/alvas/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- ''
**********************************************************************
Related
To find nltk_data directory (auto-magically), see https://stackoverflow.com/a/36383314/610569
To download nltk_data to a different path, see https://stackoverflow.com/a/48634212/610569
To config nltk_data path (i.e. set a different path for NLTK to find nltk_data), see https://stackoverflow.com/a/22987374/610569
Try
nltk.download('all')
this will download all the data and no need to download individually.
Install Pip: run in terminal : sudo easy_install pip
Install Numpy (optional): run : sudo pip install -U numpy
Install NLTK: run : sudo pip install -U nltk
Test installation: run: python
then type : import nltk
To download the corpus
run : python -m nltk.downloader all
Do not name your file nltk.py I used the same code and name it nltk, and got the same error as you have, I changed the file name and it went well.
This worked for me:
nltk.set_proxy('http://user:password#proxy.example.com:8080')
nltk.download()
Please Try
import nltk
nltk.download()
After running this you get something like this
NLTK Downloader
---------------------------------------------------------------------------
d) Download l) List u) Update c) Config h) Help q) Quit
---------------------------------------------------------------------------
Then, Press d
Do As Follows:
Downloader> d all
You will get following message on completion, and Prompt then Press q
Done downloading collection all
you can't have a saved python file called nltk.py because the interpreter is reading from that and not from the actual file.
Change the name of your file that the python shell is reading from and try what you were doing originally:
import nltk and then nltk.download()
It's very simple....
Open pyScripter or any editor
Create a python file eg: install.py
write the below code in it.
import nltk
nltk.download()
A pop-up window will apper and click on download .
I had the similar issue. Probably check if you are using proxy.
If yes, set up the proxy before doing download:
nltk.set_proxy('http://proxy.example.com:3128', ('USERNAME', 'PASSWORD'))
If you are running a really old version of nltk, then there is indeed no download module available (reference)
Try this:
import nltk
print(nltk.__version__)
As per the reference, anything after 0.9.5 should be fine
you should add python to your PATH during installation of python...after installation.. open cmd prompt type command-pip install nltk
then go to IDLE and open a new file..save it as file.py..then open file.py
type the following:
import nltk
nltk.download()
Try download the zip files from http://www.nltk.org/nltk_data/ and then unzip, save in your Python folder, such as C:\ProgramData\Anaconda3\nltk_data
if you have already saved a file name nltk.py and again rename as my_nltk_script.py. check whether you have still the file nltk.py existing. If yes, then delete them and run the file my_nltk.scripts.py it should work!
just do like
import nltk
nltk.download()
then you will be show a popup asking what to download , select 'all'. it will take some time because of its size, but eventually we will get it.
and if you are using Google Colab, you can use
nltk.download(download_dir='/content/nltkdata')
after running that you will be asked to select from a list
NLTK Downloader
-----------------------------------------------------------------
----------
d) Download l) List u) Update c) Config h) Help q)
Quit
-----------------------------------------------------------------
----------
Downloader> d
here you have to enter d as you want to download.
after that you will be asked to enter the identifier that you want to download . You can see the list of available indentifier with l command or if you want all of them just enter 'all' in the input box.
then you will see something like -
Downloading collection 'all'
|
| Downloading package abc to /content/nltkdata...
| Unzipping corpora/abc.zip.
| Downloading package alpino to /content/nltkdata...
| Unzipping corpora/alpino.zip.
| Downloading package biocreative_ppi to /content/nltkdata...
| Unzipping corpora/biocreative_ppi.zip.
| Downloading package brown to /content/nltkdata...
| Unzipping corpora/brown.zip.
| Downloading package brown_tei to /content/nltkdata...
| Unzipping corpora/brown_tei.zip.
| Downloading package cess_cat to /content/nltkdata...
| Unzipping corpora/cess_cat.zip.
.
.
.
| Unzipping models/wmt15_eval.zip.
| Downloading package mwa_ppdb to /content/nltkdata...
| Unzipping misc/mwa_ppdb.zip.
|
Done downloading collection all
---------------------------------------------------------------------------
d) Download l) List u) Update c) Config h) Help q) Quit
---------------------------------------------------------------------------
Downloader> q
True
at last you can enter q to quit.
You may try:
>> $ import nltk
>> $ nltk.download_shell()
>> $ d
>> $ *name of the package*
happy nlp'ing.

Python 3: parse html with XPath error

I am new to python 3. I am paring the html data with XPath. I use the pycharm to compile my code, my code is shown as following. please help me fix the issue (please don't use the beautiful soup).I know a lot code about parse html with python 2 xpath, if you have some materials link about parse html with python 3 xpath, please tell me. I have installed the lxml and requests library in the pycharm. Further, the terminal default is python 2.7.Thanks in advance!
from lxml import html
import requests
page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
tree = html.fromstring(page.content)
#This will create a list of buyers:
buyers = tree.xpath('//div[#title="buyer-name"]/text()')
#This will create a list of prices
prices = tree.xpath('//span[#class="item-price"]/text()')
print('Buyers: ', buyers)
print('Prices: ', prices)
The errors:
/Library/Frameworks/Python.framework/Versions/3.5/bin/python3.5
/Users/tianke0711/PycharmProjects/database/Pax_html/xpath_test.py
Traceback (most recent call last):
File
"/Users/tianke0711/PycharmProjects/database/Pax_html/xpath_test.py",
line 1, in <module>
from lxml import html
File
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-
packages/lxml/html/__init__.py", line 54, in <module>
from .. import etree
ImportError:
ddlopen(/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/lxml/etree.cpython-35m-darwin.so, 2): Library not loaded: libxml2.2.dylib
Referenced from: /Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/lxml/etree.cpython-35m-darwin.so
Reason: Incompatible library version: etree.cpython-35m-darwin.so requires version 12.0.0 or later, but libxml2.2.dylib provides version 10.0.0
Based on the error information,the libxml2.2( Library not loaded: libxml2.2.dylib) is the old version. Since the python3 need new version of libxml. Use the following command to install the new libxml, and then it works for me.
brew install libxml2
brew install libxslt
brew link libxml2 --force
brew link libxslt --force
Actually, I don't know the reason in detail. If some guys know this, please tell me! Thanks!

Reraising an exception in Cython on Python 2 and Python3

I have some Cython code that currently looks
exc = sys.exc_info()
raise exc[0], exc[1], exc[2]
This doesn't work on Python3, since the "raise from tuple" form is no longer allowed. Were this normal Python code, I would just use six.reraise, but that's not available to me here. What's the Cython friendly way to do the same, which works on both Python2 and Python3?
One great Cython feature is that the generated C code can be compiled for either Python 2 or Python 3. So your example above will work with either version of Python, unmodified.
You can tell Cython to compile code assuming Python 2 syntax and semantics (the -2 argument, which is on by default) or assuming Python 3 (the -3 argument). In either case, the resulting extension module source code can be compiled and used for Python 2 or Python 3, as long as the dynamic components (imports, etc.) are compatible.
For example:
def raises_exception():
raise KeyError("what you doin'?")
def foobar():
try:
raises_exception()
except Exception:
import sys
exc = sys.exc_info()
raise exc[0], exc[1], exc[2]
Here's a setup.py that will work on either Py2 or Py3:
from distutils.core import setup
from Cython.Build import cythonize
setup(ext_modules=cythonize("reraise.pyx"))
I can run python setup.py build_ext -i on either version of Python (provided I have cython installed for each), and the resulting extension module will work.
$ python setup.py build_ext -i # Py3 python
$ ipython3
Python 3.3.2 (v3.3.2:d047928ae3f6, Oct 4 2013, 15:49:17)
Type "copyright", "credits" or "license" for more information.
IPython 1.2.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import reraise
In [2]: reraise.foobar()
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-2-9e20eacfd84e> in <module>()
----> 1 reraise.foobar()
/.../reraise.so in reraise.foobar (reraise.c:916)()
/.../reraise.so in reraise.foobar (reraise.c:847)()
/.../reraise.so in reraise.raises_exception (reraise.c:762)()
KeyError: "what you doin'?"
In [3]: