Python 3: parse html with XPath error - html

I am new to python 3. I am paring the html data with XPath. I use the pycharm to compile my code, my code is shown as following. please help me fix the issue (please don't use the beautiful soup).I know a lot code about parse html with python 2 xpath, if you have some materials link about parse html with python 3 xpath, please tell me. I have installed the lxml and requests library in the pycharm. Further, the terminal default is python 2.7.Thanks in advance!
from lxml import html
import requests
page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
tree = html.fromstring(page.content)
#This will create a list of buyers:
buyers = tree.xpath('//div[#title="buyer-name"]/text()')
#This will create a list of prices
prices = tree.xpath('//span[#class="item-price"]/text()')
print('Buyers: ', buyers)
print('Prices: ', prices)
The errors:
/Library/Frameworks/Python.framework/Versions/3.5/bin/python3.5
/Users/tianke0711/PycharmProjects/database/Pax_html/xpath_test.py
Traceback (most recent call last):
File
"/Users/tianke0711/PycharmProjects/database/Pax_html/xpath_test.py",
line 1, in <module>
from lxml import html
File
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-
packages/lxml/html/__init__.py", line 54, in <module>
from .. import etree
ImportError:
ddlopen(/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/lxml/etree.cpython-35m-darwin.so, 2): Library not loaded: libxml2.2.dylib
Referenced from: /Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/lxml/etree.cpython-35m-darwin.so
Reason: Incompatible library version: etree.cpython-35m-darwin.so requires version 12.0.0 or later, but libxml2.2.dylib provides version 10.0.0

Based on the error information,the libxml2.2( Library not loaded: libxml2.2.dylib) is the old version. Since the python3 need new version of libxml. Use the following command to install the new libxml, and then it works for me.
brew install libxml2
brew install libxslt
brew link libxml2 --force
brew link libxslt --force
Actually, I don't know the reason in detail. If some guys know this, please tell me! Thanks!

Related

ModuleNotFoundError: No module named 'torchvision.models.feature_extraction'

I want to extract features in ResNet101, however, I have trouble importing torchvision.models.feature_extraction.
Here is my code:
from torchvision import models
from torchvision.models.feature_extractor import create_feature_extractor
res101 = models.resnet101(pretrained=True)
extractor = create_feature_extractor(
res101,
return_nodes=[
"conv1",
"maxpool",
"layer1",
"layer2",
"layer3",
"layer4",
]
)
features = extractor(inputs)
And here is the error
from torchvision.models.feature_extractor import create_feature_extractor
Traceback (most recent call last):
Input In [11] in <cell line: 1>
from torchvision.models.feature_extractor import create_feature_extractor
ModuleNotFoundError: No module named 'torchvision.models.feature_extractor'
You might be trying to use something like:
from torchvision.models.feature_extraction import create_feature_extractor
See the extraction vs extractor
Check this module
Same problem. I installed PyTorch using conda and it works fine in Jupyter notebooks. But it does not work in terminal.
Turns out the pip listed torchvision version was 0.82.
Solved by updating torchvision using pip.
Maybe some packages installed the old version for me. Hope my experience helps you.

Integrating Apache Superset and Apache Drill

I installed Apache Drill through the link in the Drill Documentation. Apache Drill works fine. I also installed and got Apache Superset running using docker. Superset also works totally fine.
But my goal is to integrate Superset and Drill together. The only tutorial I was able to find was a tutorial from Dataist. When following this tutorial they ask us to add a database.
Since I am running both Drill and Superset in my local machine they ask us to type drill+sadrill://localhost:8047/dfs/test?use_ssl=False as the SQLAlchemy URI. They ask us to test the connection by pressing test connection.
When pressing test connection I get an error message as follows.
ERROR: {"error": "Connection failed!\n\nThe error message returned was:\nCan't load plugin: sqlalchemy.dialects:drill.sadrill", "stacktrace": "Traceback (most recent call last):\n File \"/home/superset/superset/views/core.py\", line 1755, in testconn\n engine = database.get_sqla_engine(user_name=username)\n File \"/home/superset/superset/utils/core.py\", line 132, in __call__\n value = self.func(*args, **kwargs)\n File \"/home/superset/superset/models/core.py\", line 911, in get_sqla_engine\n return create_engine(url, **params)\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/__init__.py\", line 435, in create_engine\n return strategy.create(*args, **kwargs)\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/strategies.py\", line 61, in create\n entrypoint = u._get_entrypoint()\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/url.py\", line 172, in _get_entrypoint\n cls = registry.load(name)\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py\", line 240, in load\n \"Can't load plugin: %s:%s\" % (self.group, name)\nsqlalchemy.exc.NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:drill.sadrill\n"}
Can someone figure out why I am getting this error. And also if there are any other tutorials to follow which give a better idea on how to set up Drill and superset.
I have encountered a similar issue while trying to connect elasticsearch. I guess the docker image you used was amancevice/superset. This issue cause because your image is not using the latest SQLAlchemy, SQLAlchemy-Utils packages. Upgrade or reinstall these packages to fix the issue.
To uninstall:
pip uninstall SQLAlchemy
pip uninstall SQLAlchemy-Utils
To install again (latest version):
pip install SQLAlchemy
pip install SQLAlchemy-Utils
I have reported this issue here https://github.com/amancevice/docker-superset/issues/158 maybe it get fixed with the upcoming images.

Python3 html to pdf

how to convert HTML to PDF in Python3?
Xhtml2pdf does not work in Python3, got error:
import xhtml2pdf.pisa as pisa
Traceback (most recent call last):
File "", line 1, in
File "/home/hound/test/python/test_env/lib/python3.4/site-packages/xhtml2pdf/init.py", line 41, in
from xhtml2pdf.util import REPORTLAB22
File "/home/hound/test/python/test_env/lib/python3.4/site-packages/xhtml2pdf/util.py", line 302
raise Exception, "box not defined right way"
^
SyntaxError: invalid syntax
The best that I found by far is weasyprint.
From the documentation:
from weasyprint import HTML
HTML('http://weasyprint.org/').write_pdf('/tmp/weasyprint-website.pdf')
and it really works that easy. Saved me tons of time (after I wasted time trying to get xhtml2pdf and others to work in python 3 but failed.
I had the same error. Apparently for now xhtml2pdf has Python3 support only in its prerelease version - 0.2b1 (for more info see https://pypi.python.org/pypi/xhtml2pdf). I’ve solved the problem by uninstalling the previous xhtml2pdf version and installing the prerelease version
pip install --pre xhtml2pdf

Running Google's DeepDream on Windows with CUDA: ImportError DLL load failed [duplicate]

I have build .dll of _caffe.cpp on Windows (Release, x64).
I changed extension .dll to .pyd and trying to import it in python:
import caffe
File "\caffe-master\python\caffe\__init__.py", line 1, in <module>
from .pycaffe import Net, SGDSolver
File "\caffe-master\python\caffe\pycaffe.py", line 13, in <module>
from ._caffe import Net, SGDSolver
ImportError: DLL load failed: The specified module could not be found.
What does it mean, some module of dependencies missing which was included in project in Visual Studio, where I build this dll?
You need to add Python Caffe to PYTHONPATH. For example:
export PYTHONPATH=$PYTHONPATH:/home/username/caffe/python
For windows :
Adding /caffe/Build/x64/Release/pycaffe to system path(path) works for me, and I think the best way to do it is :
New a system variable : PYTHON_PKG = /caffe/Build/x64/Release/pycaffe;
Include PYTHON_PKG in path : path = %PYTHON_PKG%; %OtherDirs%
After I did this, I get PKG missing google.internal, then I did pip install google.internal in CMD. It works.
Once you have a compiled and built caffe, try
echo 'export PYTHONPATH=/path/to/caff-dir/python'
Also, you may need to run following:
pip install -r requirement.txt

I get an error message when I try FreqDist() in NLTK -- NameError: name 'nltk' is not defined

I'm learning about the NLTK and my mac
is working fine except I have trouble with the FreqDist(). (I saw another question about FreqDist() but he was getting a different error message. TypeError: unhashable type: 'list')
Here's an example:
>>> from nltk.corpus import brown
>>> news_text = brown.words(categories='news')
>>> fdist = nltk.FreqDist([w.lower() for w in news_text])
Traceback (most recent call last):
` File "<stdin>", line 1, in <module>`
`NameError: name 'nltk' is not defined`
This error message is pretty consistent. I get this message every time I try the FreqDist(). Other commands like - >>> brown.fileids() are fine.
Thanks for your help!
Before you can use FreqDist, you need to import it.
Add a line as follows:
import nltk
or if you just want to use FreqDist you should try this:
>>> from nltk.corpus import brown
>>> from nltk import FreqDist
>>> news_text = brown.words(categories='news')
>>> fdist = FreqDist([w.lower() for w in news_text])
which means you haven't installed nltk.
follow these steps to install nltk:
1:go to this link https://pypi.python.org/pypi/setuptools at the end of page you find setuptools-7.0.zip (md5) download it, then unzip it. you can find easy_install.py python script.
2:use the command sudo easy_install pip. By this time pip will be installed ready to use, (make sure you are in the directory where you can find easy_install script file).
3:use this command sudo pip install -U nltk. successful execution ensure that nltk is now installed.
4:open the IDLE then you type the following:
import nltk
if nltk is installed properly then you will be returned with console.
setuptools are required for older versions of Python. There is no need for the same if you are running 3.2+
You can easily download the same from https://pypi.python.org/pypi/nltk
For more information on http://www.nltk.org/install.html
nltk requires data you need to download first.
Then run the following code:
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
stopwords.words("english")