TCL man page: it is better to place comment section way ahead - tcl

I know I am really picky here, but like to throw it out in case I am off in my interpreting the TCL man page, actually, I wish I was wrong here, as you see the below story.
So for every new TCL developer, we recommend reading the famous "11 rules" (now it is 12 rules).
Yesterday I was asked this question: why does the following script fail?
# puts "hello
world!"
Of course it fails, I said, the first line is taken as comment, that leaves world!" as a command.
But, the newbie said, the manpage indicates that the script is parsed in certain order:
As #2 Evaluation states, the command is parsed to words first.
As #4 Double quotes states, newline is taken as is in parsing double quotes. This makes hello and world! into one word, with a newline in between.
Comments at #10 does states everything up till the next newline is ignored, but after the above processing, the newline should be the 2nd newline, the one after world!.
I see he had a point.
It makes more sense to move the comment section way ahead in the man page, maybe at the second section. With this order change, it indicates that comment recognition is preceding the word-tokenizing process.
How do you think?
Again, I have no intention to ask for change of the manpage, just want to make sure if I miss anything in interpreting the bible.
[UPDATE]
To the people suggesting to close this question as not-a-technical question, it is the same as if my colleague came here asking why that script fails even though his understanding of TCL man page indicates it is a good script.
Again, I am not asking to change the man page.
Let me re-phrase my question - when you are asked this same question, what flaw do you see in his reasoning?
[UPDATE2]
Thanks Donal. I think this is what I learnt, TCL parser goes one char by another, there is no look-ahead.
This is another example:
puts [#haha]
Such script fails at tclsh for the same reason, TCL parser does not break down the script first and only parses the string embedded inside the matching brackets, instead it recognizes "#" as the start of comment and ignores everything after it.

The rules in the Tcl(n) manual page describe pretty precisely the parser that Tcl uses. Requests to change it substantively are usually denied as they tend to have far-reaching consequences and interact with each other trickily. Verifying that a reordering of the rules is not substantive is a non-trivial task, as they correspond to quite a bit of code (our parser and a chunk of our bytecode compiler).
Adding non-normative sections (e.g., EXAMPLES) is easier.
Update based on your updated question
The problem with the reasoning is that the rules are a whole, not really a layered set of parts. They do interact with each other. (The one that usually trips people up is the interaction between the brace rule and the comment rule when inside a braced string such as a procedure body.) Comments really are true comments, and extend up to the end of the line (allowing for backslash-newline sequences) but not beyond, but they only start at places where commands start, not at other places with a # character, and that's the genuinely tricky bit.
Unfortunately, the way that the Tcl parser works is a bit different to the way that programmers think, but most of the time it's pretty good at pretending to work in a “reasonable fashion”. The tricky edge cases don't actually come up too often other than when dealing with the brace-comment interaction mentioned above. The other cases which I hit tend to be either with a switch (resolvable by just putting the comment in the arm) or with long literal lists of things where I want to comment some sections of the list; in that latter case, I actually post-process the string before using it as a list.
set exampleList {
a b c
d e f
# Not really a comment but I want to use it like one!
g h i
j k l
}
# Convert “comment” lines to empty lines
regsub -all -line "^\\s*#.*$" $exampleList "" exampleList
The general advantage of Tcl's rules is that it is actually pretty easy to embed other languages within Tcl, precisely because Tcl only treats # (and other character) as special in well-defined contexts. As long as you can have the embedded language be one that uses balanced braces — and that's almost all of them in practice — then embedding it is utterly trivial. The other cases have to use backslashes and/or double quotes and are pretty ugly, but are also a minuscule fraction of all the embedding cases.
Your colleague's problem is that he's looking at the whole script in one go, whereas the Tcl parser handles one character at a time and doesn't do meaningful amounts of lookahead. It's just some dumb code.

Related

Explain the difference between Docstring and Comment with an appropriate example in python? [duplicate]

I'm a bit confused over the difference between docstrings and comments in python.
In my class my teacher introduced something known as a 'design recipe', a set of steps that will supposedly help us students plot and organize our coding better in Python. From what I understand, the below is an example of the steps we follow - this so call design recipe (the stuff in the quotations):
def term_work_mark(a0_mark, a1_mark, a2_mark, ex_mark, midterm_mark):
''' (float, float, float, float, float) -> float
Takes your marks on a0_mark, a1_mark, a2_mark, ex_mark and midterm_mark,
calculates their respective weight contributions and sums these
contributions to deliver your overall term mark out of a maximum of 55 (This
is because the exam mark is not taken account of in this function)
>>>term_work_mark(5, 5, 5, 5, 5)
11.8
>>>term_work_mark(0, 0, 0, 0, 0)
0.0
'''
a0_component = contribution(a0_mark, a0_max_mark, a0_weight)
a1_component = contribution(a1_mark, a1_max_mark, a1_weight)
a2_component = contribution(a2_mark, a2_max_mark, a2_weight)
ex_component = contribution(ex_mark, exercises_max_mark,exercises_weight)
mid_component = contribution(midterm_mark, midterm_max_mark, midterm_weight)
return (a0_component + a1_component + a2_component + ex_component +
mid_component)
As far as I understand this is basically a docstring, and in our version of a docstring it must include three things: a description, examples of what your function should do if you enter it in the python shell, and a 'type contract', a section that shows you what types you enter and what types the function will return.
Now this is all good and done, but our assignments require us to also have comments which explain the nature of our functions, using the token '#' symbol.
So, my question is, haven't I already explained what my function will do in the description section of the docstring? What's the point of adding comments if I'll essentially be telling the reader the exact same thing?
It appears your teacher is a fan of How to Design Programs ;)
I'd tackle this as writing for two different audiences who won't always overlap.
First there are the docstrings; these are for people who are going to be using your code without needing or wanting to know how it works. Docstrings can be turned into actual documentation. Consider the official Python documentation - What's available in each library and how to use it, no implementation details (Unless they directly relate to use)
Secondly there are in-code comments; these are to explain what is going on to people (generally you!) who want to extend the code. These will not normally be turned into documentation as they are really about the code itself rather than usage. Now there are about as many opinions on what makes for good comments (or lack thereof) as there are programmers. My personal rules of thumb for adding comments are to explain:
Parts of the code that are necessarily complex. (Optimisation comes to mind)
Workarounds for code you don't have control over, that may otherwise appear illogical
I'll admit to TODOs as well, though I try to keep that to a minimum
Where I've made a choice of a simpler algorithm where a better performing (but more complex) option can go if performance in that section later becomes critical
Since you're coding in an academic setting, and it sounds like your lecturer is going for verbose, I'd say just roll with it. Use code comments to explain how you are doing what you say you are doing in the design recipe.
I believe that it's worth to mention what PEP8 says, I mean, the pure concept.
Docstrings
Conventions for writing good documentation strings (a.k.a. "docstrings") are immortalized in PEP 257.
Write docstrings for all public modules, functions, classes, and methods. Docstrings are not necessary for non-public methods, but you should have a comment that describes what the method does. This comment should appear after the def line.
PEP 257 describes good docstring conventions. Note that most importantly, the """ that ends a multiline docstring should be on a line by itself, e.g.:
"""Return a foobang
Optional plotz says to frobnicate the bizbaz first.
"""
For one liner docstrings, please keep the closing """ on the same line.
Comments
Block comments
Generally apply to some (or all) code that follows them, and are indented to the same level as that code. Each line of a block comment starts with a # and a single space (unless it is indented text inside the comment).
Paragraphs inside a block comment are separated by a line containing a single #.
Inline Comments
Use inline comments sparingly.
An inline comment is a comment on the same line as a statement. Inline comments should be separated by at least two spaces from the statement. They should start with a # and a single space.
Inline comments are unnecessary and in fact distracting if they state the obvious.
Don't do this:
x = x + 1 # Increment x
But sometimes, this is useful:
x = x + 1 # Compensate for border
Reference
https://www.python.org/dev/peps/pep-0008/#documentation-strings
https://www.python.org/dev/peps/pep-0008/#inline-comments
https://www.python.org/dev/peps/pep-0008/#block-comments
https://www.python.org/dev/peps/pep-0257/
First of all, for formatting your posts you can use the help options above the text area you type your post.
And about comments and doc strings, the doc string is there to explain the overall use and basic information of the methods. On the other hand comments are meant to give specific information on blocks or lines, #TODO is used to remind you what you want to do in future, definition of variables and so on. By the way, in IDLE the doc string is shown as a tool tip when you hover over the method's name.
Quoting from this page http://www.pythonforbeginners.com/basics/python-docstrings/
Python documentation strings (or docstrings) provide a convenient way
of associating documentation with Python modules, functions, classes,
and methods.
An object's docsting is defined by including a string constant as the
first statement in the object's definition.
It's specified in source code that is used, like a comment, to
document a specific segment of code.
Unlike conventional source code comments the docstring should describe
what the function does, not how.
All functions should have a docstring
This allows the program to inspect these comments at run time, for
instance as an interactive help system, or as metadata.
Docstrings can be accessed by the __doc__ attribute on objects.
Docstrings can be accessed through a program (__doc__) where as inline comments cannot be accessed.
Interactive help systems like in bpython and IPython can use docstrings to display the docsting during the development. So that you dont have to visit the program everytime.

Having Multiple Commands for Calling a Specific Programming Language: To Provide a Delimiter-less Option or Not?

After re-reading the off/on topic lists, I'm still not certain if this question is best posted to this site, so apologies in advance, if it is not.
Overview:
I am working on a project that mixes several programming languages and we are trying to determine important considerations for the command used to call one in particular.
For definiteness, I will list the specific languages; however, I think the principles ought to be general, so familiarity with these specific languages is not really essential.
Specific Context
Specifically, we are using: Maxima, KaTeX, Markdown and HTML). While building the prototype, we have used the following (I believe, standard) conventions:
KaTeX delimited by $ $ or $$ $$;
HTML delimited by < > </ > pairs;
Markdown works anywhere in the body, except within KaTeX or Maxima environments;
The only non-standard convention we used during this design phase was to call on Maxima using \comp{<Maxima commands>}. This command works within all the other environments (which is desired).
Now that we are ready to start using the platform, it has become apparent that this temporary command for calling Maxima is cumbersome for our users. The vast majority of use cases involve simply calling a single variable or function, e.g.
As such, we have $\eval{function-name()}(\eval{variable-name})$
as opposed to actually using Maxima for computation, e.g.
Here, it is clear that $\eval{a} + \eval{b} = \eval{a+b}$
(where \eval{a+b} would return the actual sum, as calculated by Maxima).
As such, our users would prefer a delimiter-less command option for invoking a single variable or function, e.g. \#<variable-name-in-Maxima> and \#<function-name>(<argument>) (where # is some reserved character not used in the other languages), while also having a delimited alternative for the (much less frequent) cases where they actually want to use Maxima for computation; perhaps something like \#{a+b}.
However, we have a general sense that this is not a best practice, even though we can't foresee any specific issue.
"Research" / Comparisons:
Indeed, there is precedence for delimit-less expressions for single arguments like x^2 (on any calculator) or Knuth's a \over b in TeX (which persists in LaTeX with \frac12 being parsed as \frac{1}{2}.
IIRC Knuth's point was that this delimit-less notation was more semantic (and so, in his view, preferable), and because delimiters can be added, ambiguity can be avoided, whenever the need arises: e.g. x^{22}, {a+b}\over{c+d} and \frac{12}{3}.
The Question, Proper:
Can anyone point to or explain actual shortcomings / risks associated with a dual solution like:
\#<var>, \#<function>(<arg>) and,
\#[<extended expression>],
(where # is a reserved (& escapable) character), for calling one language amongst others, as opposed to only using a delimited command?
Any alternative suggestions for how to achieve the ease-of-use and more semantic code enabled by the above solution, while keeping the code unambiguous would be very much welcome and appreciated.

Why doesn't compiler automatically brace expr?

I was reading through Donal Fellows' excellent explanation on why you should brace your expr.
It makes me wonder why doesn't the tcl compiler automatically brace {} the expr? What are conditions under which this automatic bracing will fail?
Some random thoughts on this...
One of the strengths of the language is that evaluation is governed by a small number of rules which are applied uniformly. To have the arguments to expr automatically braced would mean that there would have to be an exception in the rules for this command.
It would also mean that the interpreter would have to be rewritten, since it evaluates the arguments before it determines which command it is dealing with. The expr command could be renamed or aliased, making it harder for the interpreter to figure out when to autobrace.
It would mean that if someone wanted double evaluation of the arguments, or wanted to construct the argument in a way that does not directly match the expression tokens, they would have to work around the automatic bracing.
It would mean possible incompatibilities with a lot of old code.
To have the interpreter as a "back seat driver" isn't really the Tcl Way.
It makes me wonder why doesn't the tcl compiler automatically brace {} the expr?
It actually did so during some of the alphas for 8.0. It was removed because the wider community absolutely hated it. The core Tcl language — the 12 rules on the Tcl(n) manual page — is kept extremely small and simple, and it's applied uniformly to everything.
No special cases. That is what most Tcl programmers want.
If I remember right, the auto-bracing was particularly considered for if and while. With if, it was because omitting the braces for a test of the result of a command was a particularly common practice with Tcl 7 because it was faster (the expression engine was rather embarrassingly slow). With while, it was because omitting the braces was a common user bug.
What are conditions under which this automatic bracing will fail?
Well, it would be noticeable in something like this:
set a 1
set b 2
set op +
set c [expr $a$op$b]
Right now, that sets c to 3. With auto-bracing, it would be a syntax error (since the expression grammar doesn't have anything it can do with three variables in a row). A work-around was proposed:
set c [eval expr $a$op$b]
But frankly, getting everyone to brace their expressions except when they really wanted double-substitution (and runtime expression compilation) was considered to be better.
Double-substitution is almost always an indication of a security bug, and it's always an indication of a performance problem; there's virtually always a better (faster, safer) way to do it. Brace your expressions. It's safer and faster and really easy to do: what's not to like?

Matching nested constructs in TextMate / Sublime Text / Atom language grammars

While writing a grammar for Github for syntax highlighting programs written in the Racket language, I have stumbled upon a problem.
In Racket #| starts a multiline comment and |# ends it.
The problem is that multiline comments can be nested:
#| a comment #| still a comment |# even
more comment |#
Here is my non-working attempt:
repository:
multilinecomment:
begin: \#\|
end: \|\#
name: comment
contentName: comment
patterns:
- include: "#multilinecomment"
name: comment
- match: ([^\|]|\|(?=[^#]))*
name: comment
The intent of the match patterns are:
"#multilinecomment"
A multiline comment can contain another multiline comment.
([^\|]|\|(?=[^#]))*
The meaning of the subexpressions:
[^\|] any characters not an `|`
\|(?=[^#]) an `|` followed by a non-`#`
The entire expression thus matches a string not containg |#
Update:
Got an answer from Allan Odgaard on the TextMate mailing list:
http://textmate.1073791.n5.nabble.com/TextMate-grammars-and-nested-multiline-comments-td28743.html
So I've tested a bunch of languages in Sublime that have multiline comments (C/C++, Java, HTML, PHP, JavaScript), and none of the language syntaxes support multiline comments embedded in multiline comments - the syntax highlighting for the comment scope ends with the first "comment close" marker, not with symmetric markers. Now, this isn't to say that it's impossible, because the BracketHighlighter plugin works great for matching symmetric tags, brackets, and other markers. However, it's written in Python, and uses custom logic for its matching algorithms, something that may not be available in the Oniguruma engine that powers Sublime's syntax highlighter, and apparently Github's as well.
Basically, from your description of the problem, you need a code parser to ensure that nested comments are legal, something you can't do with just a syntax highlighting definition. If you're writing this just for Sublime, a custom plugin could take care of that, but I don't know enough about Github's Linguist syntax highlighting system to say if you're allowed to do that. I'm not a regex master yet, but it seems to me that it would be rather difficult to achieve this purely by regex, as you'd need to somehow keep track of an arbitrary number of internal symmetric "open" and "close" markers before finding (and identifying!) the final one.
Sorry I couldn't provide a definitive answer other than I'm not sure this is possible, but that's the best I can come up with without knowing more about Sublime's and Github's internals, something that (at least in Sublime's case) won't happen unless it's open-sourced. Good luck!
Old post, and I don't have the reputation for a comment, but it is emphatically NOT possible to detect arbitrarily nested comments using purely regular expressions. Intuitively, this is because all regular expressions can be transformed into a finite state machine, and keeping track of nesting depth requires a (theoretically) infinite amount of state (the number of states needs to be equal to at least the different possible nesting depths, which here is infinite).
In practice this number grows very slowly, so if you don't want to go to too much trouble you could probably write something that allows nesting up to a reasonable depth. Otherwise you'll probably need a separate phase that parses through and finds the comments to tell the syntax highlighter to ignore them.
You had the correct idea but it looks like your second pattern also matches for the "begin nested comment" sequence #| which will never give a chance for your recursive #multilinecomment pattern to kick in.
All you have to do is replace your second pattern with something similar to
(#(?=[^|])|\|(?=[^#])|[^|#])+
Take the last match out. You do not need it. Its redundant to what textmate will do naturally, which is to match all additional text in to the comment scope until the end marker comes along, or the entire pattern recurses upon itself.

Iterating over a string in Vimscript or Parse a JSON file

So I'm creating a vim script that needs to load and parse a JSON file into a local object graph. I searched and I couldn't find any native way to process a JSON file, and I don't want to add any dependencies to the script. So I wrote my own function to parse the JSON string (gotten from the file), but it's really slow. At the moment, I iterate through each character in the file like so:
let len = strlen(jsonString) - 1
let i = 0
while i < len
let c = strpart(jsonString, i, 1)
let i += 1
" A lot of code to process file....
" Note: I've tried short cutting the process by searching for enclosing double-quotes when I come across the initial double quotes (also taking into account escaping '\' character. It doesn't help
endwhile
I've also tried this method:
for c in split(jsonString, '\zs')
" Do a lot of parsing ....
endfor
For reference, a file with ~29,000 characters takes about 4 seconds to process, which is unacceptable.
Is there a better way to iterate over a string in vim script?
Or better yet, have I missed a native function to parse JSON?
Update:
I asked for no dependencies because I:
Didn't want to deal with them
Genuinely wanted some ideas for best way to do this without someone else's work.
Sometimes I just like to do things manually even though the problem has already been solved.
I'm not against plugins or dependencies at all, it's just that I'm curious. Thus the question.
I ended up creating my own function to parse the JSON file. I was creating a script that could parse the package.json file associated with node.js modules. Because of this, I could rely on a fairly consistent format and quit the processing whenever I'd retrieved the information I needed. This usually cut out large chunks of the file since most developers put the largest chunk of the file, their "readme" section, at the end. Because the package.json file is strictly defined, I left the process somewhat fragile. It assumed a root dictionary { } and actively looks for certain entries. You can find the script here: https://github.com/ahayman/vim-nodejs-complete/blob/master/after/ftplugin/javascript.vim#L33.
Of course, this doesn't answer my own question. It's only the solution to my unique problem. I'll wait a few days for new answers and pick the best one before the bounty ends (already set an alarm on my phone).
The simplest solution with the least dependencies is just using the json_decode vim function.
let dict = json_decode(jsonString)
Even though Vim's origin dates back a lot it happens that its internal string() eval() representation is that close to JSON that its likely to work unless you need special characters.
You can lookup the implementation here which even supports true/false/null if you want:
https://github.com/MarcWeber/vim-addon-json-encoding
Better use that library (vim-addon-manager allows to install dependencies easily).
Now it depends on your data whether this is good enough.
Now Benjamin Klein posted your question to vim_use which is why I'm replying.
Best and fast replies happen if you subscribe to the Vim mailinglist.
Goto vim.sf.net and follow the community link.
You cannot expect the Vim community to scrape stackoverflow.
I've added the keyword "json" and "parsing" to that little code that it can be found easier.
If this solution does not work for you you can try the many :h if_* bindings or write an external script which extracts the information you're looking for, or turns JSON into Vim's dictionary representation which can be read by eval() escaping special characters you care about correctly.
If you seek for completely correct solution omitting dependencies is one of the worst thing you can do. The eval() variant mentioned by #MarcWeber is one of the fastest, but it has its disadvantages:
Using solution for securing eval I mentioned in comment makes it no longer the fastest. In fact after you use this it makes eval() slower by more then an order of magnitude (0.02s vs 0.53s in my test).
It does not respect surrogate pairs.
It cannot be used to verify that you have correct JSON: it accepts some strings (e.g. "\<C-o>") that are not JSON strings and it allows trailing commas.
It fails to give normal error messages. It fails badly if you use vam#VerifyIsJSON I mentioned in p.1.
It fails to load floating point values like 1e10 (vim requires numbers to look like 1.0e10, but numbers like 1e10 are allowed: note “and/or” in the first paragraph).
. All of the above (except for the first) statements also apply to vim-addon-json-encoding mentioned by #MarcWeber because it uses eval. There are some other possibilities:
Fastest and the most correct is using python: pyeval('json.loads(vim.eval("varname"))'). Not faster then eval, but fastest among other possibilities. (0.04 in my test: approximately two times slower then eval())
Note that I use pyeval() here. If you want solution for vim version that lacks this functionality it will no longer be one of the fastest.
Use my json.vim plugin. It has an advantages of slightly better error reporting compared to failed vam#VerifyIsJSON, slightly worse compared to eval() and it correctly loads floating-point numbers. It can be used for verification of strings (it does not accept "\<C-a>"), but it loads lists with trailing comma just fine. It does not support surrogate pairs. It is also very slow: in the test I used (it uses 279702 character long strings) it takes 11.59s to load. Json.vim tries to use python if possible though.
For the best error reporting you can take yaml.vim and purge YAML support out of it leaving only JSON (I once have done the same thing for pyyaml, though in python: see markedjson library used in powerline: it is pyyaml minus YAML stuff plus classes with marks). But this variant is even slower then json.vim and should only be used if the main thing you need is error reporting: 207 seconds for loading the same 279702 character long string.
Note that the only variant mentioned that satisfies both requirements “no dependencies” and “no python” is eval(). If you are not fine with its disadvantages you have to throw away one or both of these requirements. Or copy-paste code. Though if you take speed into account only two candidates are left: eval() and python: if you want to parse json fast you really must use C and only these solutions spend most time in functions written in C.
Most other interpreters (ruby/perl/TCL) do not have pyeval() equivalent so they will be slower even if their JSON implementation is written in C. Some other (lua/racket (mzscheme)) have pyeval() equivalent, but e.g. luaeval('{}') is zero meaning that you will have to add additional step explicitly and recursively converting objects into vim dictionaries and lists (e.g. luaeval('vim.dict({})')) which will impact performance. Cannot say anything about mzeval(), but I have never heard about anybody actually using racket (mzscheme) with vim.