Python 2 to 3: telling 2to3 "I got this" - python-2to3

with either linters or coverage.py, you can tell the tool to ignore certain parts of your code.
for example, #pragma: no cover tells coverage not to count an exception branch as missing:
except (Exception,) as e: #pragma: no cover
if cpdb(): pdb.set_trace()
raise
Now, I know I can exclude specific fixers from 2to3. For example, to avoid fixing imports below, I can use 2to3 test_import_stringio.py -x imports.
But can use code annotations/directives to keep the fixer active, except at certain locations? For example, this bit of code is already adjusted to work for 2 and 3.
#this import should work fine in 2 AND 3.
try:
from io import StringIO
except ImportError:
#pragma-for-2to3: skip
from StringIO import StringIO
but 2to3 helpfully converts, because there is no such directive/pragma
And now this won't work in 2:
#this tests should work fine in 2 and 3.
try:
from io import StringIO
except ImportError:
#pragma-for-2to3: skip
from io import StringIO
Reason I am asking is because I want to avoid a big-bang approach. I intend to refactor code bit by bit, starting with unittests, to run under 2 and 3.
I am guessing this is not possible, just looking at my options. What I'll probably end up doing is to run the converter only on imports with -f imports for example, check what it ended up doing, do that manually myself on the code and then exclude imports from future consideration with -x imports.

Related

Is cython compatible with typing.NamedTuple?

I have the following code in file temp.py
from typing import NamedTuple
class C(NamedTuple):
a: int
b: int
c = C(1, 2)
I compile it using the command:
cythonize -3 -i temp.py
and run it using the command
python3 -c 'import temp'
I get the following exception:
Traceback (most recent call last): File "<string>", line 1, in <module> File "temp.py", line 7, in init temp
c = C(1, 2) TypeError: __new__() takes 1 positional argument but 3 were given
Version of python: 3.6.15
Version of cython: 0.29.14
Is there anything wrong in the above code/build steps ?
It'll work in the current Cython 3 alpha version (and later). It won't work in Cython 0.29.x (you're using a pretty outdated version of this, but that won't affect this feature).
It requires classes to have an __annotations__ dictionary, which is a feature that was added in the Cython 3 alpha releases.
You won't get much/any speed advantage from compiling this is Cython though - it'll still generate a normal Python class. But it will work.
in short, NO, it is not compatible. Edit: not currently compatible.
named tuples is just python magic (creating classes at runtime), cython doesn't know about it, so you have to execute that code by calling the interpreter at runtime, using exec.
# temp.pyx
temp_global = {}
exec("""
from typing import NamedTuple
class C(NamedTuple):
a: int
b: int
""",temp_global)
C = temp_global['C']
c = C(1,2)
print(c)
to test it
import pyximport
pyximport.install()
import temp
this ends up being some python code that's being executed whenever you import your binary, the entire file is being passed to exec whenever you import it, so it's not really "Cython Code", you can just write it as a python .py file and avoid cython, or just implement your "Cython class" without relying on python magic. (no named tuples or dynamic code that is created at runtime)

Admin import - Group not found

I am trying to load multiple csv files into a new db using the neo4j-admin import tool on a machine running Debian 11. To try to ensure there's no collisions in the ID fields, I've given every one of my node and relationship files.
However, I'm getting this error:
org.neo4j.internal.batchimport.input.HeaderException: Group 'INVS' not found. Available groups are: [CUST]
This is super frustrating, as I know that the INV group definitely exists. I've checked every file that uses that ID Space and they all include it.Another strange thing is that there are more ID spaces than just the CUST and INV ones. It feels like it's trying to load in relationships before it finishes loading in all of the nodes for some reason.
Here is what I'm seeing when I search through my input files
$ grep -r -h "(INV" ./import | sort | uniq
:ID(INVS),total,:LABEL
:START_ID(INVS),:END_ID(CUST),:TYPE
:START_ID(INVS),:END_ID(ITEM),:TYPE
The top one is from my $NEO4J_HOME/import/nodes folder, the other two are in my $NEO4J_HOME/import/relationships folder.
Is there a nice solution to this? Or have I just stumbled upon a bug here?
Edit: here's the command I've been using from within my $NEO4J_HOME directory:
neo4j-admin import --force=true --high-io=true --skip-duplicate-nodes --nodes=import/nodes/\.* --relationships=import/relationships/\.*
Indeed, such a thing would be great, but i don't think it's possible at the moment.
Anyway it doesn't seems a bug.
I suppose it may be a wanted behavior and / or a feature not yet foreseen.
In fact, on the documentation regarding the regular expression it says:
Assume that you want to include a header and then multiple files that matches a pattern, e.g. containing numbers.
In this case a regular expression can be used
while on the description of --nodes command:
Node CSV header and data. Multiple files will be
logically seen as one big file from the
perspective of the importer. The first line must
contain the header. Multiple data sources like
these can be specified in one import, where each
data source has its own header.
So, it appears that the neo4j-admin import considers the --nodes=import/nodes/\.* as a single .csv with the first header found, hence the error.
Contrariwise with more --nodes there are no problems.

Using sys.stdout.write() to create multiple files in NiFi?

I have a pipeline in NiFi that pulls down some invalid JSON that I need to clean up. The best solution I've concocted is to run a Python script via ExecuteStreamCommand and simultaneously clean/split it up in one fell swoop. However, even though I use sys.stdout.write() in my for loop, only the original JSON comes out in the output stream in NiFi.
Am I misusing sys.stdout.write() or is this possible, but I've just done something wrong? My end goal is for each line of the json to be a new flow file, i.e. file 1 is {"fruit":"apple",..., file 2 is {"fruit":"cherry",..., and so on.
example JSON
{"fruit":"apple", "vegetable":"celery", "location":{"country":"nor\\way", "city":"oslo", }, "color":"blue"}
{"fruit":"cherry", "vegetable":"kale", "location":{"country":"france", "city":"calais", }, "color":"green"}
{"fruit":"peach", "vegetable":"peas", "location":{"country":"united\\kingdom", "city":"london", }, "color":"yellow"}
script
import json
import re
import sys
flow_file = sys.stdin.read()
try:
load = json.loads(flow_file)
sys.stdout.write(flow_file)
except:
flow_file_esc = re.sub(r"[(\\)]", "", flow_file)
for f in flow_file_esc.splitlines():
sys.stdout.write(str(f))
Can you clean the file first with ReplaceText and then split it with SplitJson, SplitRecord, or ForkRecord?
If you need to combine the two operations and want to script it, you could try ExecuteScript with Jython (since it doesn't look like you're using native CPython libraries), I have some simple examples in my cookbook and my blog.

simplejson json AttributeError: "module" object has no attribute "dump"

I am new at programming and I have a question regarding the following error.
I am using python 2.7., and I have the following script to create a simple graph (example taken from python CrashCourse by Eric Matthes):
import matplotlib.pyplot as plt
squares = [1,4,9,16,25]
plt.plot(squares, linewitdth = 5)
#Set chart title and lable axes.
plt.title("Square Numbers", fontsize = 24)
plt.xlabel("Value", fontsize = 14)
plt.ylabel("Square of Value", fontsize = 14)
# Set size of tick labels
plt.tick_params(axis = "both", labelsize = 14)
plt.show()
When I ran this script in WindowsPowerShell I got the following error:
Traceback (most recent call last): File "mpl_squares.py", line 1, in <module>
import matplotlib.pyplot as plt
File "C:\Users\Roger\Anaconda2\lib\sitepackages\matplotlib\__init__.py, line 134, in <module> from ._version import get_versions
File "C:\Users\Roger\Anaconda2\lib\site-packages\matplotlib\_version.py", line 7, in <module> import json
File "C:\Users\Roger\Desktop\lpthw\json.py", line 7, in <module>
AttributeError: "module" object has no attribute "dump"
In other script I had the same problem when importing this module, then I found
a solution by replacing the line "import json" by "import simplejson, and It worked well.
Here is the solution I found back then:
json is simplejson, added to the stdlib. But since json was added in 2.6, simplejson has the advantage of working on more Python versions (2.4+).
simplejson is also updated more frequently than Python, so if you need (or want) the latest version, it's best to use simplejson itself, if possible.
A good practice, in my opinion, is to use one or the other as a fallback.
try: import simplejson as json
except ImportError: import json
Now I checkt in the error I got the module it is poiting out "_version.py"
This is the information contained in this file:
# This file was generated by 'versioneer.py' (0.15) from
# revision-control system data, or from the parent directory name of an
# unpacked source archive. Distribution tarballs contain a pre-generated
#copy
# of this file.
import json
import sys
version_json = '''
{
"dirty": false,
"error": null,
"full-revisionid": "26382a72ea234ee0efd40543c8ae4a30cffc4f0d",
"version": "1.5.3"
}
''' # END VERSION_JSON
def get_versions():
return json.loads(version_json)
Question:
Do you think I would have to fix something in the _version.py module by replacing
import json for import simplejson and the function added in the module?
I am thinking in a workaround to fix the problem but I don't wanna modify anything from the _version.py if it make things worse. Thank you very much for your comments and suggestions.
Best Regards
It seems like your C:\Users\Roger\Desktop\lpthw\json.py gets imported instead of Python's built-in json module.
Did you somehow add that folder (C:\Users\Roger\Desktop\lpthw) to your PYTHONPATH, e.g. with sys.path.append() or the PYTHONPATH variable? Read more about how Python finds modules.
The reason why the fix with simplejson works is that it is not overridden by some other module of the same name.
Try renaming C:\Users\Roger\Desktop\lpthw\json.py to something like C:\Users\Roger\Desktop\lpthw\myjson.py and also try to figure out how that lpthw folder made it into your PYTHONPATH.

Cython undefined symbol with c wrapper

I am trying to expose c code to cython and am running into "undefined symbol" errors when trying to use functions defined in my c file from another cython module.
Functions defined in my h files and functions using a manual wrapper work without a problem.
Basically the same case as this question but the solution (Linking against the library) isn't satisfactory for me.
I assume i am missing something in the setup.py script ?
Minimized example of my case:
foo.h
int source_func(void);
inline int header_func(void){
return 1;
}
foo.c
#include "foo.h"
int source_func(void){
return 2;
}
foo_wrapper.pxd
cdef extern from "foo.h":
int source_func()
int header_func()
cdef source_func_wrapper()
foo_wrapper.pyx
cdef source_func_wrapper():
return source_func()
The cython module i want to use the functions in:
test_lib.pyx
cimport foo_wrapper
def do_it():
print "header func"
print foo_wrapper.header_func() # ok
print "source func wrapped"
print foo_wrapper.source_func_wrapper() # ok
print "source func"
print foo_wrapper.source_func() # undefined symbol: source_func
setup.py build both foo_wrapper and test_lib
from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize
# setup wrapper
setup(
ext_modules = cythonize([
Extension("foo_wrapper", ["foo_wrapper.pyx", "foo.c"])
])
)
# setup test module
setup(
ext_modules = cythonize([
Extension("test_lib", ["test_lib.pyx"])
])
)
There are 3 different types of function in foo_wrapper:
source_func_wrapper is a python-function and python run-time handles the calling of this function.
header_func is an inline-function which is used at compile time, so its definition/machine code is not needed later on.
source_func on the other hand must be handled by static (this is the case in foo_wrapper) or dynamic (I assume this is your wish for test_lib) linker.
Further down I'll try to explain, why the setup doesn't not work out of the box, but fist I would like to introduce two (at least in my opinion) best alternatives :
A: avoid this problem altogether. Your foo_wrapper wraps c-functions from foo.h. That means every other module should use these wrapper-functions. If everyone just can access the functionality directly - this makes the whole wrapper kind of obsolete. Hide the foo.h interface in your `pyx-file:
#foo_wrapper.pdx
cdef source_func_wrapper()
cdef header_func_wrapper()
#foo_wrapper.pyx
cdef extern from "foo.h":
int source_func()
int header_func()
cdef source_func_wrapper():
return source_func()
cdef header_func_wrapper():
B: It might be valid to want to use the foo-functionality directly via c-functions. In this case we should use the same strategy as cython with stdc++-library: foo.cpp should become a shared library and there should be only a foo.pdx-file (no pyx!) which can be imported via cimport wherever needed. Additionally, libfoo.so should then be added as dependency to both foo_wrapper and test_lib.
However, approach B means more hustle - you need to put libfoo.so somewhere the dynamic loader can find it...
Other alternatives:
As we will see, there are a lot of ways to get foo_wrapper+test_lib to work. First, let's see in more detail, how loading of dynamic libraries works in python.
We start out by taking a look at the test_lib.so at hand:
>>> nm test_lib.so --undefined
....
U PyXXXXX
U source_func
there are a lot of undefined symbols most of which start with Py and will be provided by a python executable during the runtime. But also there is our evildoer - source_func.
Now, we start python via
LD_DEBUG=libs,files,symbols python
and load our extension via import test_lib. In the triggered debug -trace we can see the following:
>>>>: file=./test_lib.so [0]; dynamically loaded by python [0]
python loads test_lib.so via dlopen and starts to look-up/resolve the undefined symbols from test_lib.so:
>>>>: symbol=PyExc_RuntimeError; lookup in file=python [0]
>>>>: symbol=PyExc_TypeError; lookup in file=python [0]
these python symbols are found pretty quickly - they are all defined in the python-executable - the first place dynamic linker looks at (if this executable was linked with -Wl,-export-dynamic). But it is different with source_func:
>>>>: symbol=source_func; lookup in file=python [0]
>>>>: symbol=source_func; lookup in file=/lib/x86_64-linux-gnu/libpthread.so.0 [0]
...
>>>>: symbol=source_func; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
>>>>: ./test_lib.so: error: symbol lookup error: undefined symbol: source_func (fatal)
So after looking up all loaded shared libraries the symbol is not found and we have to abort. The fun fact is, that foo_wrapper is not yet loaded, so the source_func cannot be looked up there (it would be loaded in the next step as dependency of test_lib by python).
What happens if we start python with preloaded foo_wrapper.so?
LD_DEBUG=libs,files,symbols LD_PRELOAD=$(pwd)/foo_wrapper.so python
this time, calling import test_lib succeed, because preloaded foo_wrapper is the first place the dynamic loader looks up the symbols (after the python-executable):
>>>>: symbol=source_func; lookup in file=python [0]
>>>>: symbol=source_func; lookup in file=/home/ed/python_stuff/cython/two/foo_wrapper.so [0]
But how does it work, when foo_wrapper.so is not preloaded? First let's add foo_wrapper.so as library to our setup of test_lib:
ext_modules = cythonize([
Extension("test_lib", ["test_lib.pyx"],
libraries=[':foo_wrapper.so'],
library_dirs=['.'],
)])
this would lead to the following linker command:
gcc ... test_lib.o -L. -l:foo_wrapper.so -o test_lib.so
If we now look up the symbols, so we see no difference:
>>> nm test_lib.so --undefined
....
U PyXXXXX
U source_func
source_func is still undefined! So what is the advantage of linking against the shared library? The difference is, that now foo_wrapper.so is listed as needed in for test_lib.so:
>>>> readelf -d test_lib.so| grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [foo_wrapper.so]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
ld does not link, this is a job of the dynamic linker, but it does a dry run and help dynamic linker by noting, that foo_wrapper.so is needed in order to resolve the symbols, so it must be loaded before the search of the symbols starts. However, it does not explicitly say, that the symbol source_func must be looked in foo_wrapper.so - we could actually find it and use it anywhere.
Lets start python again, this time without preloading:
>>>> LD_DEBUG=libs,files,symbols python
>>>> import test_lib
....
>>>> file=./test_lib.so [0]; dynamically loaded by python [0]....
>>>> file=foo_wrapper.so [0]; needed by ./test_lib.so [0]
>>>> find library=foo_wrapper.so [0]; searching
>>>> search cache=/etc/ld.so.cache
.....
>>>> `foo_wrapper.so: cannot open shared object file: No such file or directory.
Ok, now the dynamic linker knows, it has to find foo_wrapper.so but it is nowhere in the path, so we get an error message.
We have to tell dynamic linker where to look for the shared libraries. There is a lot of ways, one of them is to set LD_LIBRARY_PATH:
LD_DEBUG=libs,symbols,files LD_LIBRARY_PATH=. python
>>>> import test_lib
....
>>>> find library=foo_wrapper.so [0]; searching
>>>> search path=./tls/x86_64:./tls:./x86_64:. (LD_LIBRARY_PATH)
>>>> ...
>>>> trying file=./foo_wrapper.so
>>>> file=foo_wrapper.so [0]; generating link map
This time foo_wrapper.so is found (dynamic loader looked at places hinted at by LD_LIBRARY_PATH), loaded and then used for resolving the undefined symbols in test_lib.so.
But what is the difference, if runtime_library_dirs-setup argument is used?
ext_modules = cythonize([
Extension("test_lib", ["test_lib.pyx"],
libraries=[':foo_wrapper.so'],
library_dirs=['.'],
runtime_library_dirs=['.']
)
])
and now calling
LD_DEBUG=libs,symbols,files python
>>>> import test_lib
....
>>>> file=foo_wrapper.so [0]; needed by ./test_lib.so [0]
>>>> find library=foo_wrapper.so [0]; searching
>>>> search path=./tls/x86_64:./tls:./x86_64:. (RPATH from file ./test_lib.so)
>>>> trying file=./foo_wrapper.so
>>>> file=foo_wrapper.so [0]; generating link map
foo_wrapper.so is found on a so called RPATH even if not set via LD_LIBRARY_PATH. We can see this RPATH being inserted by the static linker:
>>>> readelf -d test_lib.so | grep RPATH
0x000000000000000f (RPATH) Library rpath: [.]
however this is the path relative to the current working directory, which is most of the time not what is wanted. One should pass an absolute path or use
ext_modules = cythonize([
Extension("test_lib", ["test_lib.pyx"],
libraries=[':foo_wrapper.so'],
library_dirs=['.'],
extra_link_args=["-Wl,-rpath=$ORIGIN/."] #rather than runtime_library_dirs
)
])
to make the path relative to current location (which can change for example through copying/moving) of the resultingshared library. readelf shows now:
>>>> readelf -d test_lib.so | grep RPATH
0x000000000000000f (RPATH) Library rpath: [$ORIGIN/.]
which means the needed shared library will be searched relatively to the path of the loaded shared library, i.e test_lib.so.
That is also how your setup should be, if you would like to reuse the symbols from foo_wrapper.so which I do not advocate.
There are however some possibilities to use the libraries you have already built.
Let's go back to original setup. What happens if we first import foo_wrapper (as a kind of preload) and only then test_lib? I.e.:
>>>> import foo_wrapper
>>>>> import test_lib
This doesn't work out of the box. But why? Obviously, the loaded symbols from foo_wrapper are not visible to other libraries. Python uses dlopen for dynamical loading of shared libraries, and as explained in this good article, there are some different strategies possible. We can use
>>>> import sys
>>>> sys.getdlopenflags()
>>>> 2
to see which flags are set. 2 means RTLD_NOW, which means that the symbols are resolved directly upon the loading of the shared library. We need to OR flag withRTLD_GLOBAL=256 to make the symbols visible globally/outside of the dynamically loaded library.
>>> import sys; import ctypes;
>>> sys.setdlopenflags(sys.getdlopenflags()| ctypes.RTLD_GLOBAL)
>>> import foo_wrapper
>>> import test_lib
and it works, our debug trace shows:
>>> symbol=source_func; lookup in file=./foo_wrapper.so [0]
>>> file=./foo_wrapper.so [0]; needed by ./test_lib.so [0] (relocation dependency)
Another interesting detail: foo_wrapper.so is loaded once, because python does not load a module twice via import foo_wrapper. But even if it would be opened twice, it would be only once in the memory (the second read only increases the reference count of the shared library).
But now with won insight we could even go further:
>>>> import sys;
>>>> sys.setdlopenflags(1|256)#RTLD_LAZY+RTLD_GLOBAL
>>>> import test_lib
>>>> test_lib.do_it()
>>>> ... it works! ....
Why this? RTLD_LAZY means that the symbols are resolved not directly upon the loading but when they are used for the first time. But before the first usage (test_lib.do_it()), foo_wrapper is loaded (import inside of test_lib module) and due to RTLD_GLOBAL its symbols can be used for resolving later on.
If we don't use RTLD_GLOBAL, the failure comes only when we call test_lib.do_it(), because the needed symbols from foo_wrapper are not seen globally in this case.
To the question, why it is not such a great idea just to link both modules foo_wrapper and test_lib against foo.cpp: Singletons, see this.