Comment wizardry - language-agnostic

Comment wizardry - language-agnostic

I am not a level 60 wizard yet, so I've got a question based on the following sentence in the Wikipedia article on magic:
Any comment that has an effect on the code is magic.
How is that? In what languages is such a thing possible? More specifically, what effect can a comment have on the code?

Magic comments take various forms. In Ruby, a comment can be placed at the top of the file, which specifies an encoding that the interpreter will use when reading in the file:
# -*- coding: UTF-8 -*-
more
In Unix systems, the first line can be a shebang. In any language with octothorpes for comments, this ammounts to a comment that the OS reads to decide how to interpret the file
#!/bin/bash
echo hello from bash
more
In Haskell, certain types of comments known as pragmas can be used to switch on (or off) language features.
{-# LANGUAGE OverlappingInstances, NoTypeFamilies #-}
more

I wouldn't worry too much about that remark in Wikipedia. I think the intention is just to say that a comment "should not" affect behavior (since that's what comment means), so if it does then it must be some kind of special case and hence is "magic" by the definition in use in that section of the article. The author of that line of the article didn't necessarily have any particular language in mind.
If you fancy being pedantic, then a comment can easily affect program behavior in C (in this example by containing a number of line breaks):
#include <stdio.h>
int main() {
/* If this multi-line comment were deleted, then
the line-numbers in the remainder of the file
would be smaller. Boring magic.
*/
printf("%d\n", __LINE__);
}
I'm pretty sure that this is not what the author of the remark intended, so the remark isn't precise. It's special cases of the comment syntax that are "magic".

You do also get those "magic" comments...
That commented out section where everything works fine if you leave those comments in. You remove them and then poof, nothing works. (These are usually caused by having one line in the middle of the commented section uncommented)
(Commentting is Magic)

Related

Looking for good bracket characters for a template engines code blocks

I am looking for a good character pair to use for enclosing template code within a template for the next version of our inhouse template engine.
The current one uses plain {} but this makes the parser very complex to be able to distinguish between real code blocks and random {} chars in the literal text in the template.
I think a dual char combination like the one used in asp.net or php is a better aproach but the question is char character pair should I use or is there some perfect single char that is never used and thats easy to write.
Some criteria that needs to be fullfilled:
Cannot be changed by HTMLEncode, the sources will be editable through webbased HTML editors and plain textareas and need to stay the same no matter what editor is used.
Regex will be used to clean code parts after editing in an HTML editor that might have encoded the internal part of the code block like & chars.
Should be resonably easy to write on both english and swedish keyboard layout.
Should be a very rare combination, the template will generate HTML and Text and could include CSS and Javascript literal text with JSON, so any combination that might collide with those is bad unless very rare. That means that {{}} is out as it can occur in JSON.
The code within the code block will contain spaces, underscores, dollar and many more combinations, not only fieldnames but if/while constructs as well.
The parser is generated with Antlr
I am looking for suggestions and objections to find one or more combinations that would work i as many situations as possible, possibly multiple alternative pairs for different situations.

Template-Toolkit defaults to [% template directives %], which works reasonably well.

Is it actually possible to parse freeform HTML with a regular expression?

now before you prepare to right a speech about the perils of HTML parsing with regex, I already know it. This is more just a curiosity question, than actually wanting to know the question for practical usage.
Basically, given a file of HTML in some random, but perfectly valid format, can you parse out the content of <p> tags using a half-sane number of regular expressions? (and also pretending that <p> tags can not be nested or some other minor limitation)

It's certainly possible to extract all the text between {insert character sequence 1 here} and {insert character sequence 2 here} with regular expressions, so long as those sequences aren't overlapping. For example:
/(?<{insert character sequence 1 here}).*?(?={insert character sequence 2 here})/
Of course, it's terribly brittle and will break horribly if what you're running it on is even slightly malformed, or contains either character sequence outside the context where it's meaningful, or any number of other ways. If you oversimplify the problem, then yes you can get away with an oversimplified solution.

Yes, under restrictions like valid HTML and non-nesting, you can use regular expressions for certain uses.

It depends on what you limitations you'd consider minor. XHTML, for one obvious example, is somewhat more amenable to simple parsing. A great deal depends on whether you're thinking in terms of parsing existing HTML, or generating new HTML that could be parsed relatively easily. For the former case, I'd say the restrictions were major -- i.e., you'd need to know a great deal about the specific HTML in question to parse it. For the latter case, I'd say the restrictions were fairly trivial -- i.e., would only involve how you write the HTML, but would not affect what you could express in HTML.

Optimize and compress HTML

I have a few hand-crafted web pages. When deploying them I would like to run them through a tool so that new smaller HTML files are created, with extraneous whitespace taken out, etc.
We already use YUICompressor for our Javascript and our CSS, and we tend to follow all of the techniques described by the Yahoo performance team.
Is there a good, free tool that does this? I prefer tools that would fit into our deployment process similarly to YUICompressor.

HTML Tidy does the job.
I use the following on one document that I generate (a rather large one). This saved me about 10% on the post-gzip size.
tidy -c -omit -ashtml -utf8 --doctype strict \
--drop-proprietary-attributes yes --output-bom no \
--wrap 0 source.html > target.html
-c — Replace surplus presentational tags and attributes
-omit — Drop optional end tags
-ashtml — use HTML rather than XHTML (HTML is leaner and XHTML provides no benefits for most use cases)
-utf8 — So we don't have to use entities for characters outside the character set (entities are more bytes)
--doctype strict — use Strict (again, leaner)
--drop-proprietary-attributes yes — get rid of proprietary junk
--output-bom no — BOMs cause issues in some clients
--wrap 0 — Have very long lines

Plain old minify will also attack your HTML for you, if you want.
But HTML minification isn't, generally, hugely effective:
Taking runs of whitespace down to one won't do that much. If you're already using gzip/deflate, that'll be compressing the whitespace quite efficiently. You can't remove all whitespace as single whitespaces can often have an effect on rendering that it is desirable to keep.
Taking comments out may have an effect, depending on how much comment content you actually have. But you'd have to be careful not to hit conditional comments.
Apart from that, there is not much in an HTML document that can be ‘minified’. Obviously the JS idea of packing variable names down to the shortest possible string is inapplicable.
Doing all this with regex, as most minifiers do, is a bit dodgy. You have to stick to a limited ‘normal’ range of markup that won't trip it up.
With HTML minification you're typically getting less gain (and less post-gzip gain) than JS/CSS minification, and for dynamically-generated pages you have more overhead (as you can't pre-minify them like with static scripts/styles). Some templating languages may already have built-in features for trimming whitespace at generation time; if available in your environment, use that.

Terminology for the opposite of commented out

When a piece of code is commented we say just that, it's "commented out".
But when it's not commented out, what is that?
Uncommented isn't quite the same. Active?
It's definitely not commented in.
What's the best way to refer to the act of de-commenting out code?

Uncommented is the most common word for that.

I call code which isn't "commented out" simply "code"

Visual Studio has a function called "Comment out the selected lines".
The opposite function is called "Uncomment the selected lines". I use the term "uncommented."

A couple possibilities:
Live code
Legacy code

I wish it were "uncommented in."
I use "revived" or "restored" for code that used to be commented out, but no longer is. I use "live" or "uncommented" for code that's intended to be compiled or executed.

If it was once commented out in the past, then by uncommenting it it will become
Decommented.
(You can recomment it if you want but I won't recommand that..)
If it was never ever commented out, it's just
code.
These are, to me now, the several different (*) states a Live code can have. (Or maybe it's just yoda english.)
*: implicit potential meta- [add your latin-born term here :P].

It's just a piece of un-out-commented, un-disabled, un-erased, un-inactivated, un-unused, un-deprecated, un-obsolete, un-rewritten, un-archived code.
Or, in layman's terms, code.

Think of an editorial document. Its kind of like saying "what is the un-deleted writing called?" Just, the writing.

Although it is widely used to refer to 'live' code, 'Uncommented code' is an ambiguous term.
It could refer to code that was once commented out, but has been edited to allow it to be run, or it could refer to code that has no descriptive comments.

I call commented out code...poor code. If it is to be commented out then just toss it all together. Your version control system should keep track of the various states of your previous work. Rather than commenting it out for others to figure out why it was commented out later doesn't seem to be a good coding practice!

Seeing how many disparate answers exist for this question, there are a few things you should do.
Pick a term and stick with it.
Consistency is most important. It's okay if it's not the best one ever, and you can change your choice if a more obvious term appears in the distant future.
Explicitly describe that term to your teammates.
Don't just say, "it means uncommented, or whatever you call it." Don't just say, "it's the opposite of commented." Tell them that it means code that was previously commented out, and has now had its commenting syntax removed. Tell them that this code is now active and will execute when called. Never assume that your team is smart enough to "just get it" just because they nod when you use the term.
As a subjective answer, I use the term uncommented. It's a bad name for the behavior, but at least it's mildly intuitive. It beats nonsense like unhashed for languages that use the # character for comments.

I honestly do say "commented in". In phrases like "well, it works with that line commented out, so let's comment it back in and re-run the test". "Uncomment" would be more correct, but sounds clunkier. I wouldn't use that expression in formal writing, though, just when talking to my pair.

During a debugging session, I often comment and uncomment lines of code.
From a purely semantic (and yes, anally pedantic) perspective, it's the action that's described by the phrase, not the code.
"Commented Out" is a verb describing the action whereby a statement was turned into a comment to remove it from the code that's acted upon by the language's parser.
Code having been subjected to the opposite action, that of taking a comment and turning it into a statement to include it in the code that the language's parser acts upon, would be "Statemented In". But of course that's ridiculous.
That said, I agree with Andrew that outside of a transient debugging session, before it's committed, code should be removed if it's not used and not left around in comments to confuse things.

The problem is with the term "commented out". It's an abuse of the comment feature of most languages. In C/C++ you should use the preprocessor to conditionally compile your code, perhaps like the following:
#if 0
...
... here is the code that is not in the build
..
#endif
...
... here is the code that is in the build
...
I recall a coding standard in a place I used to work at used "#ifdef NOT_DEF" and a few other symbols in place of "#if 0" to add some semantics to the "commented out" block.
The terms used under this scheme were "included" and "excluded" code, though "included" was usually assumed when talking generally about "the code".
Of course, not all modern languages have such a preprocessing feature, so you're back to abusing the comment feature.

I ended up going with commented code being code under comment and normal code being code not under comment. The action being take code from under comment or bring out from under comment.

We call it 'unremd'. You won't find it in the dictionary but it is the opposite of REM(d) and most importantly has fewer syllables...

How do you handle translation of text with markup?

I'm developing multi-language support for our web app. We're using Django's helpers around the gettext library. Everything has been surprisingly easy, except for the question of how to handle sentences that include significant HTML markup. Here's a simple example:
Please log in to continue.
Here are the approaches I can think of:
Change the link to include the whole sentence. Regardless of whether the change is a good idea in this case, the problem with this solution is that UI becomes dependent on the needs of i18n when the two are ideally independent.
Mark the whole string above for translation (formatting included). The translation strings would then also include the HTML directly. The problem with this is that changing the HTML formatting requires changing all the translation.
Tightly couple multiple translations, then use string interpolation to combine them. For the example, the phrase "Please %s to continue" and "log in" could be marked separately for translation, then combined. The "log in" is localized, then wrapped in the HREF, then inserted into the translated phrase, which keeps the %s in translation to mark where the link should go. This approach complicates the code and breaks the independence of translation strings.
Are there any other options? How have others solved this problem?

Solution 2 is what you want. Send them the whole sentence, with the HTML markup embedded.
Reasons:
The predominant translation tool, Trados, can preserve the markup from inadvertent corruption by a translator.
Trados can also auto-translate text that it has seen before, even if the content of the tags have changed (but the number of tags and their position in the sentence are the same). At the very least, the translator will give you a good discount.
Styling is locale-specific. In some cases, bold will be inappropriate in Chinese or Japanese, and italics are less commonly used in East Asian languages, for example. The translator should have the freedom to either keep or remove the styles.
Word order is language-specific. If you were to segment the above sentence into fragments, it might work for English and French, but in Chinese or Japanese the word order would not be correct when you concatenate. For this reason, it is best i18n practice to externalize entire sentences, not sentence fragments.

2, with a potential twist.
You certainly could localize the whole string, like:
loginLink=Please log in to continue
However, depending on your tooling and your localization group, they might prefer for you to do something like:
// tokens in this string add html links
loginLink=Please {0}log in{1} to continue
That would be my preferred method. You could use a different substitution pattern if you have localization tooling that ignores certain characters. E.g.
loginLink=Please %startlink%log in%endlink% to continue
Then perform the substitution in your jsp, servlet, or equivalent for whatever language you're using ...

Disclaimer: I am not experienced in internationalization of software myself.
I don't think this would be good in any case - just introduces too much coupling …
As long as you keep formatting sparse in the parts which need to be translated, this could be okay. Giving translators the possibility to give special words importance (by either making them a link or probably using <strong /> emphasis sounds like a good idea. However, those translations with (X)HTML possibly cannot be used anywhere else easily.
This sounds like unnecessary work to me …
If it were me, I think I would go with the second approach, but I would put the URI into a formatting parameter, so that this can be changed without having to change all those translations.
Please log in to continue.
You should keep in mind that you may need to teach your translators a basic knowledge of (X)HTML if you go with this approach, so that they do not screw up your markup and so that they know what to expect from that text they write. Anyhow, this additional knowledge might lead to a better semantic markup, because, as mentioned above, texts could be translated and annotated with (X)HTML to reflect local writing style.

What ever you do keep the whole sentence as one string. You need to understand the whole sentece to translate it correctly.
Not all words should be translated in all languages: e.g. in Norwegian one doesn't use "please" (we can say "vær så snill" literally "be so kind" but when used as a command it sounds too forceful) so the correct norwegian vould be:
"Logg inn for å fortsette" lit.: "Log in to continue" or
"Fortsett ved å logge inn" lit.: "Continue by to log in" etc.
You must allow completely changing the order, e.g. in a fictional demo language:
"Für kontinuer Loggen bitte ins" (if it was real) lit.: "To continue log please in"
Some language may even have one single word for (most of) this sentence too...
I'll recommend solution 1 or possibly "Please %{startlink}log in%{endlink} to continue" this way the translator can make the whole sentence a link if that's more natural, and it can be completely restructured.

Interesting question, I'll be having this problem very soon. I think I'll go for 2, without any kind of tricky stuff. HTML markup is simple, urls won't move anytime soon, and if anything is changed a new entry will be created in django.po, so we get a chance to review the translation ( ex: a script should check for empty translations after makemessages ).
So, in template :
{% load i18n %}
{% trans 'hello world' %}
... then, after python manage.py makemessages I get in my django.po
#: templates/out.html:3
msgid "hello world"
msgstr ""
I change it to my needs
#: templates/out.html:3
msgid "hello world"
msgstr "bonjour monde"
... and in the simple yet frequent cases I'll encounter, it won't be worth any further trouble. The other solutions here seems quite smart but I don't think the solution to markup problems is more markup. Plus, I want to avoid too much confusing stuff inside templates.
Your templates should be quite stable after a while, I guess, but I don't know what other trouble you expect. If the content changes over and over, perhaps that content's place is not inside the template but inside a model.
Edit: I just checked it out in the documentation, if you ever need variables inside a translation, there is blocktrans.

Makes no sense, how would you translate "log in"?
I don't think many translators have experience with HTML (the regular non-HTML-aware translators would be cheaper)
I would go with option 3, or use "Please %slog in%s to continue" and replace the %s with parts of the link.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008