simplejson json AttributeError: "module" object has no attribute "dump" - json

I am new at programming and I have a question regarding the following error.
I am using python 2.7., and I have the following script to create a simple graph (example taken from python CrashCourse by Eric Matthes):
import matplotlib.pyplot as plt
squares = [1,4,9,16,25]
plt.plot(squares, linewitdth = 5)
#Set chart title and lable axes.
plt.title("Square Numbers", fontsize = 24)
plt.xlabel("Value", fontsize = 14)
plt.ylabel("Square of Value", fontsize = 14)
# Set size of tick labels
plt.tick_params(axis = "both", labelsize = 14)
plt.show()
When I ran this script in WindowsPowerShell I got the following error:
Traceback (most recent call last): File "mpl_squares.py", line 1, in <module>
import matplotlib.pyplot as plt
File "C:\Users\Roger\Anaconda2\lib\sitepackages\matplotlib\__init__.py, line 134, in <module> from ._version import get_versions
File "C:\Users\Roger\Anaconda2\lib\site-packages\matplotlib\_version.py", line 7, in <module> import json
File "C:\Users\Roger\Desktop\lpthw\json.py", line 7, in <module>
AttributeError: "module" object has no attribute "dump"
In other script I had the same problem when importing this module, then I found
a solution by replacing the line "import json" by "import simplejson, and It worked well.
Here is the solution I found back then:
json is simplejson, added to the stdlib. But since json was added in 2.6, simplejson has the advantage of working on more Python versions (2.4+).
simplejson is also updated more frequently than Python, so if you need (or want) the latest version, it's best to use simplejson itself, if possible.
A good practice, in my opinion, is to use one or the other as a fallback.
try: import simplejson as json
except ImportError: import json
Now I checkt in the error I got the module it is poiting out "_version.py"
This is the information contained in this file:
# This file was generated by 'versioneer.py' (0.15) from
# revision-control system data, or from the parent directory name of an
# unpacked source archive. Distribution tarballs contain a pre-generated
#copy
# of this file.
import json
import sys
version_json = '''
{
"dirty": false,
"error": null,
"full-revisionid": "26382a72ea234ee0efd40543c8ae4a30cffc4f0d",
"version": "1.5.3"
}
''' # END VERSION_JSON
def get_versions():
return json.loads(version_json)
Question:
Do you think I would have to fix something in the _version.py module by replacing
import json for import simplejson and the function added in the module?
I am thinking in a workaround to fix the problem but I don't wanna modify anything from the _version.py if it make things worse. Thank you very much for your comments and suggestions.
Best Regards

It seems like your C:\Users\Roger\Desktop\lpthw\json.py gets imported instead of Python's built-in json module.
Did you somehow add that folder (C:\Users\Roger\Desktop\lpthw) to your PYTHONPATH, e.g. with sys.path.append() or the PYTHONPATH variable? Read more about how Python finds modules.
The reason why the fix with simplejson works is that it is not overridden by some other module of the same name.
Try renaming C:\Users\Roger\Desktop\lpthw\json.py to something like C:\Users\Roger\Desktop\lpthw\myjson.py and also try to figure out how that lpthw folder made it into your PYTHONPATH.

Related

Is cython compatible with typing.NamedTuple?

I have the following code in file temp.py
from typing import NamedTuple
class C(NamedTuple):
a: int
b: int
c = C(1, 2)
I compile it using the command:
cythonize -3 -i temp.py
and run it using the command
python3 -c 'import temp'
I get the following exception:
Traceback (most recent call last): File "<string>", line 1, in <module> File "temp.py", line 7, in init temp
c = C(1, 2) TypeError: __new__() takes 1 positional argument but 3 were given
Version of python: 3.6.15
Version of cython: 0.29.14
Is there anything wrong in the above code/build steps ?
It'll work in the current Cython 3 alpha version (and later). It won't work in Cython 0.29.x (you're using a pretty outdated version of this, but that won't affect this feature).
It requires classes to have an __annotations__ dictionary, which is a feature that was added in the Cython 3 alpha releases.
You won't get much/any speed advantage from compiling this is Cython though - it'll still generate a normal Python class. But it will work.
in short, NO, it is not compatible. Edit: not currently compatible.
named tuples is just python magic (creating classes at runtime), cython doesn't know about it, so you have to execute that code by calling the interpreter at runtime, using exec.
# temp.pyx
temp_global = {}
exec("""
from typing import NamedTuple
class C(NamedTuple):
a: int
b: int
""",temp_global)
C = temp_global['C']
c = C(1,2)
print(c)
to test it
import pyximport
pyximport.install()
import temp
this ends up being some python code that's being executed whenever you import your binary, the entire file is being passed to exec whenever you import it, so it's not really "Cython Code", you can just write it as a python .py file and avoid cython, or just implement your "Cython class" without relying on python magic. (no named tuples or dynamic code that is created at runtime)

dask.delayed KeyError with distributed scheduler

I have a function interpolate_to_particles written in c and wrapped with ctypes. I want to use dask.delayed to make a series of calls to this function.
The code runs successfully without dask
# Interpolate w/o dask
result = interpolate_to_particles(arg1, arg2, arg3)
and with the distributed schedular in single-threaded mode
# Interpolate w/ dask
from dask.distributed import Client
client = Client()
result = dask.delayed(interpolate_to_particles)(arg1, arg2, arg3)
result_c = result.compute(scheduler='single-threaded')
but if I instead call
result_c = result.compute()
I get the following KeyError:
> Traceback (most recent call last): File
> "/path/to/lib/python3.6/site-packages/distributed/worker.py",
> line 3287, in dumps_function
> result = cache_dumps[func] File "/path/to/lib/python3.6/site-packages/distributed/utils.py",
> line 1518, in __getitem__
> value = super().__getitem__(key) File "/path/to/lib/python3.6/collections/__init__.py",
> line 991, in __getitem__
> raise KeyError(key) KeyError: <function interpolate_to_particles at 0x1228ce510>
The worker logs accessed from the dask dashboard do not provide any information. Actually, I do not see any information that the workers have done anything besides starting up.
Any ideas on what could be occurring, or suggested tools that I can use to further debug? Thanks!
Given your comments it sounds like your function does not serialize well. To test this, you might try pickling the function in one process, and try unpickling it in another.
>>> import pickle
>>> print(pickle.dumps(interpolate_to_particles))
b'some bytes printed out here'
And then in another process
>>> import pickle
>>> interpolate_to_particles = pickle.loads(b'the same bytes you had before')
If this doesn't work then you'll know that that's your problem. I would encourage you to look up "how to make sure that ctypes functions are serializable" or something similar, or ask another question with that smaller scope here on Stack Overflow.

How to check encoding of a CSV file

I have a CSV file and I wish to understand its encoding. Is there a menu option in Microsoft Excel that can help me detect it
OR do I need to make use of programming languages like C# or PHP to deduce it.
You can use Notepad++ to evaluate a file's encoding without needing to write code. The evaluated encoding of the open file will display on the bottom bar, far right side. The encodings supported can be seen by going to Settings -> Preferences -> New Document/Default Directory and looking in the drop down.
In Linux systems, you can use file command. It will give the correct encoding
Sample:
file blah.csv
Output:
blah.csv: ISO-8859 text, with very long lines
If you use Python, just use a print() function to check the encoding of a csv file. For example:
with open('file_name.csv') as f:
print(f)
The output is something like this:
<_io.TextIOWrapper name='file_name.csv' mode='r' encoding='utf8'>
You can also use python chardet library
# install the chardet library
!pip install chardet
# import the chardet library
import chardet
# use the detect method to find the encoding
# 'rb' means read in the file as binary
with open("test.csv", 'rb') as file:
print(chardet.detect(file.read()))
Use chardet https://github.com/chardet/chardet (documentation is short and easy to read).
Install python, then pip install chardet, at last use the command line command.
I tested under GB2312 and it's pretty accurate. (Make sure you have at least a few characters, sample with only 1 character may fail easily).
file is not reliable as you can see.
Or you can execute in python console or in Jupyter Notebook:
import csv
data = open("file.csv","r")
data
You will see information about the data object like this:
<_io.TextIOWrapper name='arch.csv' mode='r' encoding='cp1250'>
As you can see it contains encoding infotmation.
CSV files have no headers indicating the encoding.
You can only guess by looking at:
the platform / application the file was created on
the bytes in the file
In 2021, emoticons are widely used, but many import tools fail to import them. The chardet library is often recommended in the answers above, but the lib does not handle emoticons well.
icecream = '🍦'
import csv
with open('test.csv', 'w') as f:
wf = csv.writer(f)
wf.writerow(['ice cream', icecream])
import chardet
with open('test.csv', 'rb') as f:
print(chardet.detect(f.read()))
{'encoding': 'Windows-1254', 'confidence': 0.3864823918622268, 'language': 'Turkish'}
This gives UnicodeDecodeError while trying to read the file with this encoding.
The default encoding on Mac is UTF-8. It's included explicitly here but that wasn't even necessary... but on Windows it might be.
with open('test.csv', 'r', encoding='utf-8') as f:
print(f.read())
ice cream,🍦
The file command also picked this up
file test.csv
test.csv: UTF-8 Unicode text, with CRLF line terminators
My advice in 2021, if the automatic detection goes wrong: try UTF-8 before resorting to chardet.
In Python, You can Try...
from encodings.aliases import aliases
alias_values = set(aliases.values())
for encoding in set(aliases.values()):
try:
df=pd.read_csv("test.csv", encoding=encoding)
print('successful', encoding)
except:
pass
As it is mentioned by #3724913 (Jitender Kumar) to use file command (it also works in WSL on Windows), I was able to get encoding information of a csv file by executing file --exclude encoding blah.csv using info available on man file as file blah.csv won't show the encoding info on my system.
import pandas as pd
import chardet
def read_csv(path: str, size: float = 0.10) -> pd.DataFrame:
"""
Reads a CSV file located at path and returns it as a Pandas DataFrame. If
nrows is provided, only the first nrows rows of the CSV file will be
read. Otherwise, all rows will be read.
Args:
path (str): The path to the CSV file.
size (float): The fraction of the file to be used for detecting the
encoding. Defaults to 0.10.
Returns:
pd.DataFrame: The CSV file as a Pandas DataFrame.
Raises:
UnicodeError: If the encoding of the file cannot be detected with the
initial size, the function will retry with a larger size (increased by
0.20) until the encoding can be detected or an error is raised.
"""
try:
byte_size = int(os.path.getsize(path) * size)
with open(path, "rb") as rawdata:
result = chardet.detect(rawdata.read(byte_size))
return pd.read_csv(path, encoding=result["encoding"])
except UnicodeError:
return read_csv(path=path, size=size + 0.20)
Hi, I just added a function to find the correct encoding and read the csv in the given file path. Thought it would be useful
Just add the encoding argument that matches the file you`re trying to upload.
open('example.csv', encoding='UTF8')

Cause of jinja2.exceptions.TemplateNotFound even when using just jinja2

The cause of the problem is obvious after the fact, but I'd like to share the not-too-obvious cause here.
When running code such as
import jinja2
templateLoader = jinja2.FileSystemLoader(searchpath=".")
templateEnv = jinja2.Environment(loader=templateLoader,
trim_blocks=True,
lstrip_blocks=True)
htmlTemplateFile = 'file.jinja.html'
htmlTemplate = templateEnv.get_template(htmlTemplateFile)
if you get this problem:
Traceback (most recent call last):
...
File "file.py", line xyz, in some_func
htmlTemplate = templateEnv.get_template(htmlTemplateFile)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/jinja2/environment.py", line 812, in get_template
return self._load_template(name, self.make_globals(globals))
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/jinja2/environment.py", line 774, in _load_template
cache_key = self.loader.get_source(self, name)[1]
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/jinja2/loaders.py", line 187, in get_source
raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: file.jinja.html
you may find the discussions online point that this issue must have something to do with the interaction of jinja2 with flask, with GAE, with Pyramid, or with SQL, and it may indeed be that your templates are not in a "template" folder, but this problem can arise from the interaction of jinja2 and the os module.
The culprit is changing the current directory by, for instance,
import os
os.chdir(someDir)
If templateEnv.get_template(...) is called past this point, jinja2 will look for the templates in the "current" dir, even if that has changed.
Since module os provides os.chdir but not os.pushdir/os.popdir, one has to either simulate the latter pair or avoid chdir altogether.

jsonstat.from_file() return error "can't multiply sequence by non-int of type 'list'"

I'm trying to parse a json-stat file using jsonstat.py (v 0.1.7) but am getting an error.
The code below is copied from the examples on github (https://github.com/26fe/jsonstat.py/tree/master/examples-notebooks):
from __future__ import print_function
import os
import jsonstat
os.chdir(r'D:\Desktop\JSON_Stat')
url = 'http://www.cso.ie/StatbankServices/StatbankServices.svc/jsonservice/responseinstance/NQQ25'
file_name = "test02.json"
file_path = os.path.abspath(os.path.join("..","JSON_Stat", "CSO", file_name))
I added this line to deal with non ascii characters in the file:
# -*- coding: utf-8 -*-
this succesfully downloads the json file to my desktop:
if os.path.exists(file_path):
print("using already downloaded file {}".format(file_path))
else:
print("download file and storing on disk")
jsonstat.download(url, file_path)
From here, I can load and pprint the data using the json module:
import json
import pprint as pp
with open(r"CSO\test02.json") as data_file:
data = json.load(data_file)
pp.pprint(data)
... but when I try and use the jsonstat module (as specified in the examples) I get the error mentioned in the subject:
collection = jsonstat.from_file(r"D:\Desktop\JSON_Stat\CSO\test02.json")
collection
# abbreviated error message
--> 384 self.__pos2cat = self.__size * [None]
TypeError: can't multiply sequence by non-int of type 'list'
I understand what the error message itself means but, having studied the the dimensions.py module where it occurs, am stuck trying to understand why. I was able to run the sample OECD code without issue so perhaps the data itself is not formatted in the expected way, though the source site (http://www.cso.ie/webserviceclient/) states that the json-stat format is being used.
So, finally, my questions are: has anyone run into this error and resolved it? Has anyone succesfully used the jsonstat module to parse this specific data? Alternatively, any general advice towards troubleshooting this issue is welcome.
Thanks