Access custom attrubutes from Cython - cython

In some place of my code I need to know what is the number of sentence that a token belongs to, so I assign a custom attribute as follows:
from spacy.tokens import Token
Token.set_extension('number', default=0)
After that, I want to use fast loops over tokens with Cython and access this attribute on C++ level. I know that we have:
Token.get_struct_attr
but it does not seem to work the following way:
def traverse_doc(Doc doc):
cdef int n_tokens = len(doc)
cdef tokens = []
cdef attr_hash = doc.vocab.strings.add('number')
for token in doc.c[:n_tokens]:
tokens.append(Token.get_struct_attr(&token, attr_hash))
This code produces the following tuple for each token:
(0, None, None, None)
0 seems to be a default value of attribute, but it does not pay attention to the value that I assign to this attribute.
What is the proper way to access custom attributes?

Token.get_struct_attr defaults to 0 if the attribute isn't found (or, more precisely, it delegates to Lexeme.get_struct_attr, which defaults to 0 – see here). The fact that the extension's default value is 0 is just a slightly confusing coincidence. Custom extension attributes defined by the user aren't stored as C data, because they can be anything and thus can't be typed.
So instead of the doc.c, you need to check the Python doc at the given token index:
for i in range(n_tokens):
# Option 1: Use the ._ attribute
number = doc[i]._.number
# Option 2: Use the .get method with string attribute
number = doc[i]._.get('number')

Related

How to enable highlight .cu CUDA files in PyCharm? [duplicate]

When it comes to constructors, and assignments, and method calls, the PyCharm IDE is pretty good at analyzing my source code and figuring out what type each variable should be. I like it when it's right, because it gives me good code-completion and parameter info, and it gives me warnings if I try to access an attribute that doesn't exist.
But when it comes to parameters, it knows nothing. The code-completion dropdowns can't show anything, because they don't know what type the parameter will be. The code analysis can't look for warnings.
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
peasant = Person("Dennis", 37)
# PyCharm knows that the "peasant" variable is of type Person
peasant.dig_filth() # shows warning -- Person doesn't have a dig_filth method
class King:
def repress(self, peasant):
# PyCharm has no idea what type the "peasant" parameter should be
peasant.knock_over() # no warning even though knock_over doesn't exist
King().repress(peasant)
# Even if I call the method once with a Person instance, PyCharm doesn't
# consider that to mean that the "peasant" parameter should always be a Person
This makes a certain amount of sense. Other call sites could pass anything for that parameter. But if my method expects a parameter to be of type, say, pygame.Surface, I'd like to be able to indicate that to PyCharm somehow, so it can show me all of Surface's attributes in its code-completion dropdown, and highlight warnings if I call the wrong method, and so on.
Is there a way I can give PyCharm a hint, and say "psst, this parameter is supposed to be of type X"? (Or perhaps, in the spirit of dynamic languages, "this parameter is supposed to quack like an X"? I'd be fine with that.)
EDIT: CrazyCoder's answer, below, does the trick. For any newcomers like me who want the quick summary, here it is:
class King:
def repress(self, peasant):
"""
Exploit the workers by hanging on to outdated imperialist dogma which
perpetuates the economic and social differences in our society.
#type peasant: Person
#param peasant: Person to repress.
"""
peasant.knock_over() # Shows a warning. And there was much rejoicing.
The relevant part is the #type peasant: Person line of the docstring.
If you also go to File > Settings > Python Integrated Tools and set "Docstring format" to "Epytext", then PyCharm's View > Quick Documentation Lookup will pretty-print the parameter information instead of just printing all the #-lines as-is.
Yes, you can use special documentation format for methods and their parameters so that PyCharm can know the type. Recent PyCharm version supports most common doc formats.
For example, PyCharm extracts types from #param style comments.
See also reStructuredText and docstring conventions (PEP 257).
Another option is Python 3 annotations.
Please refer to the PyCharm documentation section for more details and samples.
If you are using Python 3.0 or later, you can also use annotations on functions and parameters. PyCharm will interpret these as the type the arguments or return values are expected to have:
class King:
def repress(self, peasant: Person) -> bool:
peasant.knock_over() # Shows a warning. And there was much rejoicing.
return peasant.badly_hurt() # Lets say, its not known from here that this method will always return a bool
Sometimes this is useful for non-public methods, that do not need a docstring. As an added benefit, those annotations can be accessed by code:
>>> King.repress.__annotations__
{'peasant': <class '__main__.Person'>, 'return': <class 'bool'>}
Update: As of PEP 484, which has been accepted for Python 3.5, it is also the official convention to specify argument and return types using annotations.
PyCharm extracts types from a #type pydoc string. See PyCharm docs here and here, and Epydoc docs. It's in the 'legacy' section of PyCharm, perhaps it lacks some functionality.
class King:
def repress(self, peasant):
"""
Exploit the workers by hanging on to outdated imperialist dogma which
perpetuates the economic and social differences in our society.
#type peasant: Person
#param peasant: Person to repress.
"""
peasant.knock_over() # Shows a warning. And there was much rejoicing.
The relevant part is the #type peasant: Person line of the docstring.
My intention is not to steal points from CrazyCoder or the original questioner, by all means give them their points. I just thought the simple answer should be in an 'answer' slot.
I'm using PyCharm Professional 2016.1 writing py2.6-2.7 code, and I found that using reStructuredText I can express types in a more succint way:
class Replicant(object):
pass
class Hunter(object):
def retire(self, replicant):
""" Retire the rogue or non-functional replicant.
:param Replicant replicant: the replicant to retire.
"""
replicant.knock_over() # Shows a warning.
See: https://www.jetbrains.com/help/pycharm/2016.1/type-hinting-in-pycharm.html#legacy
You can also assert for a type and Pycharm will infer it:
def my_function(an_int):
assert isinstance(an_int, int)
# Pycharm now knows that an_int is of type int
pass

Should type information be provided for the 'self' argument in cython extension types?

I've been experimenting with wrapping C++ with cython. I'm trying to understand the implications of typing self in extension type methods.
In the docs self is not explicitly typed but it seems like there could potentially be speedups associated with typing self.
However, in my limited experimentation, explicitly typing self does not seem to yield performance increases. Is there special magic going on under the covers to handle self, or is this purely a style thing?
EDIT for clarity:
By typing self, I mean providing type information for the self argument of a method. i.e.:
cdef class foo:
cpdef bar(self):
# do stuff with self
vs
cdef class foo:
cpdef bar(foo self):
# do stuff with self
Short answer:
There is no need to verbosely type self in a class method. It's not much faster than a plain self.
Long answer:
Although there are indeed some differences in the generated c codes(One can easily check it in jupyter notebook with magic cell %%cython -a). For example:
%%cython -a
# Case 1
cdef class foo1:
def bar(self, foo1 other):
pass
def __eq__(self, foo1 other):
pass
# Case 2
cdef class foo2:
def bar(self, foo2 other):
pass
def __eq__(foo2 self, foo2 other):
pass
In the Python wrapper, self is always converted to PyObject *.
For normal method(bar), the wrapped C function signatures are identical, self are both converted to struct xxx_foo *.
For magic method(__eq__), in the wrapped C function, plain self is converted to PyObject *, but the typed foo2 self is converted to struct xxx_foo2 *. In the latter case, the python wrapper cast PyObject * to struct xxx_foo2 * and call the wrapped C function. Case 2 may have fewer pointer indirections, but there should be not much difference in performance in both case. Besides, case 2 will do more checks in the python wrapper. In practice, the profile can say everything.
As you already worked out, normally self is "translated" to the right type in the resulting c-code.
The only exceptions I'm aware of are the rich comparison operators, i.e. __eq__, __lt__,__le__ and so one.
The other special methods/operators like += or + work exactly in the same way as all other "normal" methods: self is automatically of the right type.
However, the behavior of the rich comparison operators will be changed soon, as it seems to be only a glitch in the newly introduced feature: corresponding issue.
Now, that we have established, what the cython does do, the interesting question is why cython does it this way.
For somebody comming from static typed languages it is pretty obvious, that self can be only of the class-type (exact this class or derived from this class) for which this function is defined, so I would expect self to be of this class-type. So it would be a surprise if cython would behave differently.
Yet it is probably not so clear in the age of duck-typing and mokey-patching in which classes can be changed dynamically. Let's take a look at the following example:
[]class A:
def __init__(self, val):
self.val=val
def __str__(self):
return "value=%s"%self.val
[]class B:
def __init__(self, val):
self.val="<"+val+">"
[] a,b=A(1.0),B("div")
[] print a
value=3
[] print b
<__main__.B instance at 0x0000000003D24E08>
So if we don't like how print handles the class B. It is possible to monkey-patch the class B via:
[]B.__str__=lambda self: "value=%s"%self.val
[]print b
value=<div>
So if we like the way the class A handles the __str__ method, we could try to "reuse" it:
[]B.__str__=lambda self: A.__str__(self)
[]print b
TypeError: unbound method __str__() must be called
with A instance as first argument
(got B instance instead)
So it is not possible: python checks for calls to A.__str__(self) that self is really of type A.
Thus, cython is right in using the right type for self directly and not a python object.

Codeeffects - Cannot load rule on UI that has In Rule Method with more than 255 string parameter value

It seems that CodeEffect library 4.3.2.6 has an issue with rule XML editing ON UI and transformation back from stroage to show it on UI for editing when rule XML has In Rule method call with string parameter that has string value passed in longer than 255 characters.
Is it made intentionally to avoid long rules to be edited on UI or just a bug so someone knows workaround for it?
To avoid any side effects from my code I downloaded and used Business Rule code example from codeeffects site Code Effects Demo Projects and opened it in VS2015.
In "Patient.cs" file added following code to
public class Patient
{
...
// In Rule Method that accepts only one string parameter
[Method("[NumberOfSegments]")]
public int RuleMethod01(
// explicitly specify maximum string allowed
[Parameter(ValueInputType.User,Max = 10000)]
string val)
{
return val.Split(',').Length;
}
}
On UI(using Ajax controller) I attempted to create rule with long string parameter passed in (in real project I need such long string since it contains unique parameters for In Rule method to use for calculations and cannot rely on the data sources approach that CodeEffects can offer)
Check if [NumberOfSegments] ("1111,2222,33333,4444,55555,6666,777,8888,999,0000,1111,222,333,44444,1231231,123123123,123123123,123123123,123123123,123123123,123123123,123123123,123123123,123132123,123123123,123123123,123123123,123123123,12123123,123123123,123123123,123123123,1231231233") greater than 12
But even that I explicitly specified maximum string length for parameter as 10000 in attribute Parameter UI does not allow me to enter string that has length more than 256 characters.
Documentation on CodeEffects site
Business-Rules-Data-Types does not mentioned any built-in restrictions and only way to restrict length of the string parameter use Parameter Attribute and its Max property.
Did anyone ran into the issue with such "synthetic" restrictions and can point me to the documentation or any workaround for that?
Thank you in advance for the any meaningful suggestions
PS: Just small update - when I manually edited Rule XML file and provided longer string as parameter (e.g. around 500 characters) I could not load it from the XML back to the UI the RuleEditor::Rule::InvalidElements collection contained one element with Hint property value "v122" dont know if its helpful but may be CodeEffects authors can know more about such Hint and what v122 means.
Strings longer than 255 chars (neither property values nor method params) are not supported in Code Effects. v122 is the error number that you get in response. Its original message is "The length of the value of this string element exceeds the maximum allowed limit".

I just started learning python. I want to get file name as user input

def copy_file(from_file,to_file):
content = open(from_file).read()
target = open(to_file,'w').write(content)
print open(to_file).read()
def user_input(f1):
f1 = raw_input("Enter the source file : ")
user_input(f1)
user_input(f2)
copy_file(user_input(f1),user_input(f2))
What is the mistake in this ? I tried it with argv and it was working.
You're not calling the function user_input (by using ()). (fixed in question by OP).
Also, you need to return a string from user_input. currently you're trying to set a variable f1 which is local to the function user_input. While this is possible using global - I do not recommend this (this beats keeping your code DRY).
It's possible to do something similar with objects by changing their states. String is an object - but since strings are immutable, and you can't have the function change their state - this approach of expecting a function to change the string it's given is also doomed to fail.
def user_input():
return raw_input("Enter the source file :").strip()
copy_file(user_input(),user_input())
You can see user_input does very little, it's actually redundant if you assume user input is valid.

What is the difference between a property and an instance variable?

I think I've been using these terms interchangably / wrongly!
Iain, this is basically a terminology question and is, despite the "language-agnostic" tag associated with this question, very language/environment related.
For design discussions sake, property and instance variable can be used interchangeably, since the idea is that a property is a data item describing an object.
When talking about a specific language these two can be different. For example, in C# a property is actually a function that returns an object, while an instance variable is a non-static member variable of a class.
Hershi is right about this being language specific. But to add to the trail of language specific answers:
In python, an instance variable is an attribute of an instance, (generally) something that is referred to in the instance's dictionary. This is analogous to members or instance variables in Java, except everything is public.
Properties are shortcuts to getter/setter methods that look just like an instance variable. Thus, in the following class definition (modified from Guido's new style object manifesto):
class C(object):
def __init__(self):
self.y = 0
def getx(self):
if self.y < 0: return 0
else: return self.y
def setx(self, x):
self.y = x
x = property(getx, setx)
>>> z = C()
>>> z.x = -3
>>> print z.x
0
>>> print z.y
-3
>>> z.x = 5
>>> print z.x
5
>>> print z.y
5
y is an instance variable of z, x is a property. (In general, where a property is defined, there are some techniques used to obscure the associated instance variable so that other code doesn't directly access it.) The benefit of properties in python is that a designer doesn't have to go around pre-emptively encapsulating all instance variables, since future encapsulation by converting an instance variable to a property should not break any existing code (unless the code is taking advantage of loopholes your encapsulation is trying to fix, or relying on class inspection or some other meta-programming technique).
All this is a very long answer to say that at the design level, it's good to talk about properties. It is agnostic as to what type of encapsulation you may need to perform. I guess this principle isn't language agnostic, but does apply to languages beside python.
In objective c, a property is an instance variable which can take advantage of an overloaded dot operator to call its setter and getter. So my.food = "cheeseburger" is actually interpreted as [my setFood:"cheeseburger"]. This is another case where the definition is definitely not language agnostic because objective-c defines the #property keyword.
code example done in C#
public class ClassName
{
private string variable;
public string property
{
get{ return variable; }
set { variable = value; }
}
}
Maybe thats because you first came from C++ right?!
In my school days I had professors that said class properties or class atributes all the time. Since I moved to the Java C# world, I started hearing about members. Class members, instance members...
And then Properties apear! in Java and .NET. So I think its better for you to call it members. Wheather they are instance members (or as you called it instance variable) or class Members....
Cheers!
A property can, and I suppose mostly does, return an instance variable but it can do more. You could put logic in a property, aggregate values or update other instance variables etc. I think it is best to avoid doing so however. Logic should go into methods.
In Java we have something called JavaBeans Properties, but that is basically a instance variable that follows a certain naming pattern for its getter and setter.
At add to what has been said, in a langauge like C#, a property is essentially a get and set function. As a result, it can have custom logic that runs in addition to the getting/setting. An instance variable cannot do this.
A property is some sort of data associated with an object. For instance, a property of a circle is its diameter, and another is its area.
An instance variable is a piece of data that is stored within an object. It doesn't necessarily need to correspond directly with a property. For instance (heh), a circle may store its radius in an instance variable, and calculate its diameter and area based on that radius. All three are still properties, but only the radius is stored in an instance variable.
Some languages have the concept of "first class" properties. This means that to a client application, the property looks and is used like an instance variable. That is, instead of writing something like circle.getDiameter(), you would write circle.diameter, and instead of circle.setRadius(5), you would write circle.radius = 5.
In contrast to the other answers given, I do think that there is a useful distinction between member variables and properties that is language-agnostic.
The distinction is most apparent in component-oriented programming, which is useful anywhere, but easiest to understand in a graphical UI. In that context, I tend to think of the design-time configuration of a component as manipulating the "properties" of an object. For example, I choose the foreground and background colors, the border style, and font of a text input field by setting its properties. While these properties could be changed at runtime, they typically aren't. At runtime, a different set of variables, representing the content of the field, are much more likely to be read and written. I think of this information as the "state" of the component.
Why is this distinction useful? When creating an abstraction for wiring components together, usually only the "state" variables need to be exposed. Going back to the text field example, you might declare an interface that provides access to the current content. But the "properties" that control the look and feel of the component are only defined on a concrete implementation class.