How would you define a keyword beyond the fact that it is reserved. - function

My question is Python specific (3.4.3).
My question is specific to Built-In functions only.
It is clear to me the difference between a keyword (reserved word) and an identifier (user-defined variable).
See: https://en.wikipedia.org/wiki/Reserved_word
Likewise, I understand the basic meaning of the terminology 'function'
See : https://en.wikipedia.org/wiki/Functional_programming \
and
http://www.learnpython.org/en/Functions
However, I am having difficulty understanding the difference between Built-in functions and keywords; such as 'if' and 'for'.
https://docs.python.org/3/library/functions.html#built-in-funcs
What is the difference between the two? Keyword and Function.
Is the Keyword 'if' not simply a built in function? If so, why does it not appear in the official list of Built-In functions in the Python documentation?
https://docs.python.org/3/library/functions.html#built-in-funcs
It certainly behaves as a function. Is it simply because it preforms a procedure as opposed to returning a value? In which case how would you define it? As a method?
I have searched high and low on stackoverflow and I cannot seem to locate an answer.
Answers such as the two examples given below do not answer the overriding questions for me. Which are;
1) What defines a keyword as a keyword, rather than a builtIn function?
2) If keywords such as 'if' are not functions, then what are they? They are not classes etc. I understand that 'IF' is an example of a condition statement but what is the generic terminology for these keywords. The word keyword only defines the fact that it is reserved within the language, it does not define what the actual object is, i.e. function, class, method etc.
http://stackoverflow.com/questions/6054672/whats-the-difference-between-a-keyword-or-a-statement-and-a-function-call
http://stackoverflow.com/questions/155609/difference-between-a-method-and-a-function?rq=1

Keywords are those that describe the action to be performed, or specify how to interpret something (give meaning to instructions)
Functions are simply labels (for a set of instructions).
If you change function names it won't matter to Python (you can edit built in modules), but you can't relabel keywords.
You have already added tons of references to both, so I will not cite more.

Related

Having Multiple Commands for Calling a Specific Programming Language: To Provide a Delimiter-less Option or Not?

After re-reading the off/on topic lists, I'm still not certain if this question is best posted to this site, so apologies in advance, if it is not.
Overview:
I am working on a project that mixes several programming languages and we are trying to determine important considerations for the command used to call one in particular.
For definiteness, I will list the specific languages; however, I think the principles ought to be general, so familiarity with these specific languages is not really essential.
Specific Context
Specifically, we are using: Maxima, KaTeX, Markdown and HTML). While building the prototype, we have used the following (I believe, standard) conventions:
KaTeX delimited by $ $ or $$ $$;
HTML delimited by < > </ > pairs;
Markdown works anywhere in the body, except within KaTeX or Maxima environments;
The only non-standard convention we used during this design phase was to call on Maxima using \comp{<Maxima commands>}. This command works within all the other environments (which is desired).
Now that we are ready to start using the platform, it has become apparent that this temporary command for calling Maxima is cumbersome for our users. The vast majority of use cases involve simply calling a single variable or function, e.g.
As such, we have $\eval{function-name()}(\eval{variable-name})$
as opposed to actually using Maxima for computation, e.g.
Here, it is clear that $\eval{a} + \eval{b} = \eval{a+b}$
(where \eval{a+b} would return the actual sum, as calculated by Maxima).
As such, our users would prefer a delimiter-less command option for invoking a single variable or function, e.g. \#<variable-name-in-Maxima> and \#<function-name>(<argument>) (where # is some reserved character not used in the other languages), while also having a delimited alternative for the (much less frequent) cases where they actually want to use Maxima for computation; perhaps something like \#{a+b}.
However, we have a general sense that this is not a best practice, even though we can't foresee any specific issue.
"Research" / Comparisons:
Indeed, there is precedence for delimit-less expressions for single arguments like x^2 (on any calculator) or Knuth's a \over b in TeX (which persists in LaTeX with \frac12 being parsed as \frac{1}{2}.
IIRC Knuth's point was that this delimit-less notation was more semantic (and so, in his view, preferable), and because delimiters can be added, ambiguity can be avoided, whenever the need arises: e.g. x^{22}, {a+b}\over{c+d} and \frac{12}{3}.
The Question, Proper:
Can anyone point to or explain actual shortcomings / risks associated with a dual solution like:
\#<var>, \#<function>(<arg>) and,
\#[<extended expression>],
(where # is a reserved (& escapable) character), for calling one language amongst others, as opposed to only using a delimited command?
Any alternative suggestions for how to achieve the ease-of-use and more semantic code enabled by the above solution, while keeping the code unambiguous would be very much welcome and appreciated.

Operators and Functions [duplicate]

Is there any substantial difference between operators and methods?
The only difference I see is the way the are called, do they have other differences?
For example in Python concatenation, slicing, indexing are defined as operators, while (referring to strings) upper(), replace(), strip() and so on are methods.
If I understand question currectly...
In nutshell, everything is a method of object. You can find "expression operators" methods in python magic class methods, in the operators.
So, why python has "sexy" things like [x:y], [x], +, -? Because it is common things to most developers, even to unfamiliar with development people, so math functions like +, - will catch human eye and he will know what happens. Similar with indexing - it is common syntax in many languages.
But there is no special ways to express upper, replace, strip methods, so there is no "expression operators" for it.
So, what is different between "expression operators" and methods, I'd say just the way it looks.
Your question is rather broad. For your examples, concatenation, slicing, and indexing are defined on strings and lists using special syntax (e.g., []). But other types may do things differently.
In fact, the behavior of most (I think all) of the operators is constrolled by magic methods, so really when you write something like x + y a method is called under the hood.
From a practical perspective, one of the main differences is that the set of available syntactic operators is fixed and new ones cannot be added by your Python code. You can't write your own code to define a new operator called $ and then have x $ y work. On the other hand, you can define as many methods as you want. This means that you should choose carefully what behavior (if any) you assign to operators; since there are only a limited number of operators, you want to be sure that you don't "waste" them on uncommon operations.
Is there any substantial difference between operators and
methods?
Practically speaking, there is no difference because each operator is mapped to a specific Python special method. Moreover, whenever Python encounters the use of an operator, it calls its associated special method implicitly. For example:
1 + 2
implicitly calls int.__add__, which makes the above expression equivalent1 to:
(1).__add__(2)
Below is a demonstration:
>>> class Foo:
... def __add__(self, other):
... print("Foo.__add__ was called")
... return other + 10
...
>>> f = Foo()
>>> f + 1
Foo.__add__ was called
11
>>> f.__add__(1)
Foo.__add__ was called
11
>>>
Of course, actually using (1).__add__(2) in place of 1 + 2 would be inefficient (and ugly!) because it involves an unnecessary name lookup with the . operator.
That said, I do not see a problem with generally regarding the operator symbols (+, -, *, etc.) as simply shorthands for their associated method names (__add__, __sub__, __mul__, etc.). After all, they each end up doing the same thing by calling the same method.
1Well, roughly equivalent. As documented here, there is a set of special methods prefixed with the letter r that handle reflected operands. For example, the following expression:
A + B
may actually be equivalent to:
B.__radd__(A)
if A does not implement __add__ but B implements __radd__.

Regex getting the tags from an <a href= ...> </a> and the likes

I've tried the answers I've found in SOF, but none supported here : https://regexr.com
I essentially have an .OPML file with a large number of podcasts and descriptions.
in the following format:
<outline text="Software Engineering Daily" type="rss" xmlUrl="http://softwareengineeringdaily.com/feed/podcast/" htmlUrl="http://softwareengineeringdaily.com" />
What regex I can use to so I can just get the title and the link:
Software Engineering Daily
http://softwareengineeringdaily.com/feed/podcast/
Brief
There are many ways to go about this. The best way is likely using an XML parser. I would definitely read this post that discusses use of regex, especially with XML.
As you can see there are many answers to your question. It also depends on which language you are using since regex engines differ. Some accept backreferences, whilst others do not. I'll post multiple methods below that work in different circumstances/for different regex flavours. You can probably piece together from the multiple regex methods below which parts work best for you.
Code
Method 1
This method works in almost any regex flavour (at least the normal ones).
This method only checks against the attribute value opening and closing marks of " and doesn't include the possibility for whitespace before or after the = symbol. This is the simplest solution to get the values you want.
See regex in use here
\b(text|xmlUrl)="[^"]*"
Similarly, the following methods add more value to the above expression
\b(text|xmlUrl)\s*=\s*"[^"]*" Allows whitespace around =
\b(text|xmlUrl)=(?:"[^"]*"|'[^']*') Allows for ' to be used as attribute value delimiter
As another alternative (following the comments below my answer), if you wanted to grab every attribute except specific ones, you can use the following. Note that I use \w, which should cover most attributes, but you can just replace this with whatever valid characters you want. \S can be used to specify any non-whitespace characters or a set such as [\w-] may be used to specify any word or hyphen character. The negation of the specific attributes occurs with (?!text|xmlUrl), which says don't match those characters. Also, note that the word boundary \b at the beginning ensures that we're matching the full attribute name of text and not the possibility of other attributes with the same termination such as subtext.
\b((?!text|xmlUrl)\w+)="[^"]*"
Method 2
This method only works with regex flavours that allow backreferences. Apparently JGsoft applications, Delphi, Perl, Python, Ruby, PHP, R, Boost, and Tcl support single-digit backreferences. Double-digit backreferences are supported by JGsoft applications, Delphi, Python, and Boost. Information according this article about numbered backreferences from Regular-Expressions.info
See regex in use here
This method uses a backreference to ensure the same closing mark is used at the start and end of the attribute's value and also includes the possibility of whitespace surrounding the = symbol. This doesn't allow the possibility for attributes with no delimiter specified (using xmlUrl=http://softwareengineeringdaily.com/feed/podcast/ may also be valid).
See regex in use here
\b(text|xmlUrl)\s*=\s*(["'])(.*?)\2
Method 3
This method is the same as Method 2 but also allows attributes with no delimiters (note that delimiters are now considered to be space characters, thus, it will only match until the next space).
See regex in use here
\b(text|xmlUrl)\s*=\s*(?:(["'])(.*?)\2|(\S*))
Method 4
While Method 3 works, some people might complain that the attribute values might either of 2 groups. This can be fixed by either of the following methods.
Method 4.A
Branch reset groups are only possible in a few languages, notably JGsoft V2, PCRE 7.2+, PHP, Delphi, R (with PCRE enabled), Boost 1.42+ according to Regular-Expressions.info
This also shows the method you would use if backreferences aren't possible and you wanted to match multiple delimiters ("([^"])"|'([^']*))
See regex in use here
\b(text|xmlUrl)\s*=\s*(?|"([^"]*)"|'([^']*)'|(\S*))
Method 4.B
Duplicate subpatterns are not often supported. See this Regular-Expresions.info article for more information
This method uses the J regex flag, which allows duplicate subpattern names ((?<v>) is in there twice)
See regex in use here
\b(text|xmlUrl)\s*=\s*(?:(["'])(?<v>.*?)\2|(?<v>\S*))
Results
Input
<outline text="Software Engineering Daily" type="rss" xmlUrl="http://softwareengineeringdaily.com/feed/podcast/" htmlUrl="http://softwareengineeringdaily.com" />
Output
Each line below represents a different group. New matches are separated by two lines.
text
Software Engineering Daily
xmlUrl
http://softwareengineeringdaily.com/feed/podcast/
Explanation
I'll explain different parts of the regexes used in the Code section that way you understand the usage of each of these parts. This is more of a reference to the methods above.
"[^"]*" This is the fastest method possible (to the best of my knowledge) to grabbing anything between two " symbols. Note that it does not check for escaped backslashes, it will match any non-" character between two ". Whilst "(.*?)" can also be used, it's slightly slower
(["'])(.*?)\2 is basically shorthand for "(.*?)"|'(.*?)'. You can use any of the following methods to get the same result:
(?:"(.*?)"|'(.*?)')
(?:"([^"])"|'([^']*)') <-- slightly faster than line above
(?|) This is a branch reset group. When you place groups inside it like (?|(x)|(y)) it returns the same group index for both matches. This means that if x is captured, it'll get group index of 1, and if y is captured, it'll also get a group index of 1.
For simple HTML strings you might get along with
Url=(['"])(.+?)\1
Here, take group $2, see a demo on regex101.com.
Obligatory: consider using a parser instead (see here).

How do I know which is a function and which is an operator? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
MySQL have both functions and operators. However, it is not that clear for an arbitrary keyword whether it is a function or an operator.
For example, I believe ASCII() is a function (it appears in the string functions section of the manual). However, LIKE appears there as well, and it does not appear to be a function; for example, since the syntax does not force (...) after the LIKE keyword, and the docs mention that
By default, there must be no whitespace between a function name and the parenthesis following it.
In some cases it is that clear. For example, the IN keyword appears in the Comparison Functions and Operators section of the manual (a non-disclosing section name), and it appears there with the name IN() (as if it was a function), but the examples show SELECT 2 IN (0,3,5,7);, which hints that this is an operator (watch the space after the keyword).
In the same section there is INTERVAL(). Reading carefully shows the following line in the description of this keyword:
It is required that N1 < N2 < N3 < ... < Nn for this function to work correctly.
which hints that this is indeed a function, and not an operator. LEAST(), which also appears there, does not mention whether it is a function or an operator.
My questions are as follows:
Are there any internal differences between the concepts of function and of operator in MySQL?
Is there a way to figure out, given a keyword, whether it is a function of an operator?
Can a keyword be both, depending on context? I know that some keywords can both a function and a type, for example.
I wish to know that both in order to understand the abstract structure of MySQL, and in order to use it for syntax highlighting.
MySQL provides a list of "non-typed" operators here.
Basically, a function is followed by a list of arguments enclosed in parentheses. Even functions that don't take arguments, such as now() require the parentheses.
An operator, on the other hand, is part of the syntax of the MySQL query language. These are known to the parser, which recognizes them. Operators often use "infix" notation, where the operator appears between the arguments. However, this is not required (just consider the unary minus operator).
A cursory look at the list of operators shows that something can be both an operator and a function. An example is mod().
The most important difference to me is that users can define functions. But users cannot (yet) define operators. Unlike object oriented languages, SQL does not offer a way to provide additional definitions for operators.
And, for your purpose, you should peruse the manual pages to get the lists of things that you care about.

What is your system for avoiding keyword naming clashes?

Typically languages have keywords that you are unable to use directly with the exact same spelling and case for naming things (variables,functions,classes ...) in your program. Yet sometimes a keyword is the only natural choice for naming something. What is your system for avoiding/getting around this clash in your chosen technology?
I just avoid the name, usually. Either find a different name or change it slightly - e.g. clazz instead of class in C# or Java. In C# you can use the # prefix, but it's horrible:
int #int = 5; // Ick!
There is nothing intrinsically all-encompassing about a keyword, in that it should stop you from being able to name your variables. Since all names are just generalized instances of some type to one degree or another, you can always go up or down in the abstraction to find another useful name.
For example, if your writing a system that tracks students and you want an object to represent their study in a specific field, i.e. they've taken a "class" in something, if you can't use the term directly, or the plural "classes", or an alternative like "studies", you might find a more "instanced" variation: studentClass, currentClass, etc. or a higher perspective: "courses", "courseClass" or a specfic type attribute: dailyClass, nightClass, etc.
Lots of options, you should just prefer the simplest and most obvious one, that's all.
I always like to listen to the users talk, because the scope of their language helps define the scope of the problem, often if you listen long enough you'll find they have many multiple terms for the same underlying things (with only subtle differences). They usually have the answer ...
Paul.
My system is don't use keywords period!
If I have a function/variable/class and it only seems logical to name it with a keyword, I'll use a descriptive word in front of the keyword.
(adjectiveNoun) format. ie: personName instead of Name where "Name" is a keyword.
I just use a more descriptive name. For instance, 'id' becomes identifier, 'string' becomes 'descriptionString,' and so on.
In Python I usually use proper namespacing on my modules to avoid name clashes.
import re
re.compile()
instead of:
from re import *
compile()
Sometimes, when I can't avoid keyword name clashes I simply drop the last letter off the name of my variable.
for fil in files:
pass
As stated before either change class to clazz in Java/C#, or use some underscore as a prefix, for example
int _int = 0;
There should be no reason to use keywords as variable names. Either use a more detailed word or use a thesaraus. Capitalizing certain letters of the word to make it not exactly like the keyword is not going to help much to someone inheriting your code later.
Happy those with a language without ANY keywords...
But joke apart, I think in the seldom situations where "Yet sometimes a keyword is the only natural choice for naming something." you can get along by prefixing it with "my", "do", "_" or similar.
I honestly can't really think of many such instances where the keyword alone makes a good name ("int", "for" and "if" are definitely bad anyway). The only few in the C-language family which might make sense are "continue" (make it "doContinue"), "break" (how about "breakWhenEOFIsreached" or similar ?) and the already mentioned "class" (how about "classOfThingy" ?).
In other words: make the names more reasonable.
And always remember: code is WRITTEN only once, but usualy READ very often.
Typically I follow Hungarian Notation. So if, for whatever reason, I wanted to use 'End' as a variable of type integer I would declare it as 'iEnd'. A string would be 'strEnd', etc. This usually gives me some room as far as variables go.
If I'm working on a particular personal project that other people will only ever look at to see what I did, for example, when making an add-on to a game using the UnrealEngine, I might use my initials somewhere in the name. 'DS_iEnd' perhaps.
I write my own [vim] syntax highlighters for each language, and I give all keywords an obvious colour so that I notice them when I'm coding. Languages like PHP and Perl use $ for variables, making it a non-issue.
Developing in Ruby on Rails I sometime look up this list of reserved words.
In 15 years of programming, I've rarely had this problem.
One place I can immediately think of, is perhaps a css class, and in that case, I'd use a more descriptive name. So instead of 'class', I might use 'targetClass' or something similar.
In python the generally accepted method is to append an '_'
class -> class_
or -> or_
and -> and_
you can see this exemplified in the operator module.
I switched to a language which doesn't restrict identifier names at all.
First of all, most code conventions prevent such a thing from happening.
If not, I usually add a descriptive prose prefix or suffix:
the_class or theClass infix_or (prefix_or(class_param, in_class) , a_class) or_postfix
A practice, that is usually in keeping with every code style advice you can find ("long names don't kill", "Longer variable names don't take up more space in memory, I promise.")
Generally, if you think the keyword is the best description, a slightly worse one would be better.
Note that, by the very premise of your question you introduce ambiguity, which is bad for the reader, be it a compiler or human. Even if it is a custom to use class, clazz or klass and even if that custom is not so custom that it is a custom: it takes a word word, precisely descriptive as word may be, and distorts it, effectively shooting w0rd's precision in the "wrd". Somebody used to another w_Rd convention or language might have a few harsh wordz for your wolds.
Most of us have more to say about things than "Flower", "House" or "Car", so there's usually more to say about typeNames, decoratees, class_params, BaseClasses and typeReferences.
This is where my personal code obfuscation tolerance ends:
Never(!!!) rely on scoping or arcane syntax rules to prevent name clashes with "key words". (Don't know any compiler that would allow that, but, these days, you never know...).
Try that and someone will w**d you in the wörd so __rd, Word will look like TeX to you!
My system in Java is to capitalize the second letter of the word, so for example:
int dEfault;
boolean tRansient;
Class cLass;