UTF-8 in Python 3 - how to override sys.stdout.encoding? - json

MCVE
In terminal while loading IPython/Jupyter Notebook as .json:
$ python
Python 3.5.2 | packaged by conda-forge | (default, Jul 26 2016, 01:32:08)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> import io
>>> with io.open('Untitled.ipynb','r',encoding='utf-8') as sf:
... data = json.load(sf) # error on this line
... print(data)
...
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\xf3' in position 189292: ordinal not in range(128)
How to make this work?
Also:
>>> print(sys.getdefaultencoding())
utf-8
>>> print(sys.stdout.encoding)
ANSI_X3.4-1968 # I assume this is the main reason why it does not work
What have I tried?
export PYTHONIOENCODING=UTF-8 does not work.
import importlib; importlib.reload(sys); sys.setdefaultencoding("UTF-8") does not work.
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.buffer, 'strict') gives TypeError: write() argument must be str, not bytes
Related: 1, 2, 3

Related

Hazm: POSTagger(): ArgumentError: argument 2: <class 'TypeError'>: wrong type

I have got an error for running the below code. May you give me some help?
from __future__ import unicode_literals
from hazm import *
tagger = POSTagger(model='resources/postagger.model')
tagger.tag(word_tokenize('ما بسیار کتاب موانیم'))
Error:
---------------------------------------------------------------------------
ArgumentError Traceback (most recent call last)
<ipython-input-16-1d74d781e0c1> in <module>
1 tagger = POSTagger(model='resources/postagger.model')
----> 2 tagger = POSTagger()
3 tagger.tag(word_tokenize('ما بسیار کتاب موانیم'))
~/.local/lib/python3.6/site-packages/hazm/SequenceTagger.py in __init__(self, patterns, **options)
21 def __init__(self, patterns=[], **options):
22 from wapiti import Model
---> 23 self.model = Model(patterns='\n'.join(patterns), **options)
24
25 def train(self, sentences):
~/.local/lib/python3.6/site-packages/wapiti/api.py in __init__(self, patterns, encoding, **options)
283 self._model = _wapiti.api_new_model(
284 ctypes.pointer(self.options),
--> 285 self.patterns
286 )
287
ArgumentError: argument 2: <class 'TypeError'>: wrong type
I am using ubuntu18.04 on windows 10. Also, I put mentioned files in resources file beside of code.
Python 3.6.9
Package of hazm
I have no problem to run Chunker one from this packege!
chunker = Chunker(model='resources/chunker.model')
tagged = tagger.tag(word_tokenize('واقعا ک بعضیا چقد بی درکن و ادعا دارن فقط بنده خدا لابد دسترسی نداره ب دکتری چیزی نگران شد'))
tree2brackets(chunker.parse(tagged))
its because of wapiti package! wapiti does not supporting python 3 and just work with python 2! if you need postagger, you should use another postagger package!

converting h5 file to csv using h5py

how can I convert an h5 file to its corresponding csv file? these h5 files are the output of DeepLabCut
import h5py
import pandas as pd
path_to_file = '/scratch3/3d_pose/animalpose/experiments/moth-filtered-Mona-2019-12-06_10k_iters/videos/mothDLC_resnet50_moth-filteredDec6shuffle1_10000.h5'
f = h5py.File(path_to_file, 'r')
print(f.keys())
I get:
scratch/sjn-p3/anaconda/anaconda3/bin/python /scratch3/pycharm-2019.3/plugins/python/helpers/pydev/pydevconsole.py --mode=client --port=42112
import sys; print('Python %s on %s' % (sys.version, sys.platform))
sys.path.extend(['/scratch3/3d_pose/animalpose/moth_original'])
Python 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 09:07:38)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 6.2.1
Python 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 09:07:38)
[GCC 7.3.0] on linux
runfile('/scratch3/3d_pose/animalpose/moth_original/converth5tocsv.py', wdir='/scratch3/3d_pose/animalpose/moth_original')
/scratch3/pycharm-2019.3/plugins/python/helpers/pydev/_pydev_bundle/pydev_import_hook.py:21: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
module = self._system_import(name, *args, **kwargs)
KeysView(<HDF5 file "mothDLC_resnet50_moth-filteredDec6shuffle1_10000.h5" (mode r)>)
so I am not sure how to proceed when I don't get the keys

About deep learning's theano tutorial?

Look at http://deeplearning.net/tutorial/gettingstarted.html
I use python3.5 to write the code in windows 7
import pickle, gzip, numpy
f = gzip.open('mnist.pkl.gz', 'rb')
train_set, vaild_set, test_set = pickle.load(f)
f.close()
But I get the error:
Traceback (most recent call last):
File "e:\python_workspace\theanoTest\DataSet.py", line 7, in <module>
train_set, vaild_set, test_set = pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)
But in python3 the default encode is 'utf-8'
import sys
print(sys.getdefaultencoding())
So I don't know why does the error occur?

python ordered_dict from json

I am using Python 2.6.6, and trying to generate a ordered_dict from json string. I could understand that I could use object_pairs_hook of json Decoder/loads, but unfortunately it's not supported in 2.6.6. Is there any way out?
e.g.
template_s = '{ "aa": {"_type": "T1"}, "bb": {"_type": "T11"}}'
json.loads(template_s, object_pairs_hook=OrderedDict)
>>> json.loads(json_str, object_pairs_hook=OrderedDict)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/json/__init__.py", line 318, in loads
return cls(encoding=encoding, **kw).decode(s)
TypeError: __init__() got an unexpected keyword argument 'object_pairs_hook'
Thanks
I was able to do the same with simplejson
import simplejson as json
json.loads(config_str, object_pairs_hook=json.OrderedDict)

Running Stanford POS tagger in NLTK leads to "not a valid Win32 application" on Windows

I am trying to use stanford POS tagger in NLTK by the following code:
import nltk
from nltk.tag.stanford import POSTagger
st = POSTagger('E:\Assistant\models\english-bidirectional-distsim.tagger',
'E:\Assistant\stanford-postagger.jar')
st.tag('What is the airspeed of an unladen swallow?'.split())
and here is the output:
Traceback (most recent call last):
File "E:\J2EE\eclipse\WSNLP\nlp\src\tagger.py", line 5, in <module>
st.tag('What is the airspeed of an unladen swallow?'.split())
File "C:\Python34\lib\site-packages\nltk\tag\stanford.py", line 59, in tag
return self.tag_sents([tokens])[0]
File "C:\Python34\lib\site-packages\nltk\tag\stanford.py", line 81, in tag_sents
stdout=PIPE, stderr=PIPE)
File "C:\Python34\lib\site-packages\nltk\internals.py", line 153, in java
p = subprocess.Popen(cmd, stdin=stdin, stdout=stdout, stderr=stderr)
File "C:\Python34\lib\subprocess.py", line 858, in __init__
restore_signals, start_new_session)
File "C:\Python34\lib\subprocess.py", line 1111, in _execute_child
startupinfo)
OSError: [WinError 193] %1 is not a valid Win32 application
P.S. My java home is set and I have no problem with my java installation. Can someone explain what this error is talking about? It is not informative for me. Thanks in advance.
Looks like your Java installation is botched or missing.
It worked after a lot of trial and error:
It seems that NLTK Internal cannot find the java binary automatically on windows, so we need to identify it as follows:
import os
import nltk
from nltk.tag.stanford import POSTagger
os.environ['JAVA_HOME'] = r'C:\Program Files\Java\jre6\bin'
st = POSTagger('E:\stanford-postagger-2014-10-26\models\english-left3words-distsim.tagger',
'E:\stanford-postagger-2014-10-26\stanford-postagger.jar')
st.tag(nltk.word_tokenize('What is the airspeed of an unladen swallow?'))
As one of the gurus said to me: "don't forget to add "r" while working with "\" in strings."