What is the meaning of an alternative according to ecma262? - ecmascript-6

I've been wondering what's the meaning of "alternative" in ecma262.
i've seen that the term "alternative" was used many times in the spec.
here are some examples:
quote taken from this section
so, in this example, the nonterminal ForStatement actually has four alternative right-hand sides.
quote taken from this section
A production that has multiple alternative definitions will typically have a distinct algorithm for each alternative
quote taken from this section
a production that has multiple alternative definitions will typically have for each alternative a distinct algorithm for each applicable named static semantic rule.
what does it mean "production that has multiple alternative definitions" ?
i assume that alternative mean the right hand side of a production, here is a simple picture that shows what i mean.
on the picture we can see that the area covered by Pink is the whole Production.
and the area covered by Red is the Nonterminal
finally i'm assuming that the area covered by purple is the Alternative
A production that has multiple alternative definitions will typically have a distinct algorithm for each alternative
however it's still doesn't sounds right, because how can a one individual production have multiple alternatives ?

The word has its normal, English meaning:
offering or expressing a choice
So to take the first instance:
so, in this example, the nonterminal ForStatement actually has four alternative right-hand sides.
And just before that it lists them:
for ( LexicalDeclaration ; ) Statement
for ( LexicalDeclaration ; Expression ) Statement
for ( LexicalDeclaration Expression ; ) Statement
for ( LexicalDeclaration Expression ; Expression ) Statement
Four alternative things you can put on the right-hand side of the keyword for.

In formal language theory, a production has a left-hand side and a right-hand side. But in less formal contexts like the EcmaScript spec, it's common to group productions that have the same left-hand side.
So in a formal context, you might see:
A : B
A : C D
and you would say "There are 2 productions, each with a LHS and a RHS."
But in the EcmaScript spec, you might see:
A :
B
C D
and you would say "There is 1 production, with a LHS and 2 alternatives."
(This avoids confusion over whether "right-hand side" would refer to everything after the colon, or just a single line.)
So when you ask "how can one individual production have multiple alternatives ?", it sounds like you're thinking of the formal context, where indeed it wouldn't make sense. But it does make sense in the less formal context.
(Note that the EcmaScript spec actually uses both terminology schemes, but it's usually not difficult to know which.)

Related

Hyphenating arbitrary text automatically

What kinds of challenges are there facing automatic hyphenation? It seems that you could just draw word by word, breaking when the length of the line exceeds the length of the viewport (or whatever we're wrapping our text in), placing hyphens after as many characters as can fit (provided at least two characters fit and the word is at least four characters), skipping words that already contain a hyphen (there's no requirement that words have to be hyphenated).
But I note how Firefox and IE need a dictionary to be able to hyphenate with CSS's hyphens. This seems to imply that there are further issues regarding where we can place hyphens.
What kinds of issues are these? Do any exist in the English language or do they only exist in other languages?
You have these issues in all languages. You can only place a hyphen where meaningful tokens result from the split, as has already been pointed out. You don't want to, for example, split a word like "wr-ong".
This may or may not be a syllable, while in most languages (including English) it is. But the main point is that you cannot pin it down as easily just with some simple rules. You would need to consider a lot of phonology to get a highly accurate result, and these rules vary from language to language.
With this background, I can see why one would take a dictionary instead, and frankly, being a computational linguist myself, this is also what I would probably opt for.
If you DO want to go for an automatic solution, I would recommend doing some research in English phonology of syllables, or the so-called syllabification. You might want to start with this article on Wikipedia:
Wikipedia - Syllabification

l20n with HTML markup?

How would I use l20n if I wanted to create something like this:
About <strong>Firefox</strong>
I want to translate the phrase as a whole but I also want the markup. I don't want to have to do this:
<aboutBrowser "About {{ browserBrandShortName }}">
<aboutBrowserStrong "About <strong>{{ browserBrandShortName }}</strong>">
...as the translation itself is now duplicated.
I understand that this might not be in the scope of l20n, but it is probably a common enough case in the real world. Is there some kind of established way to go about this?
Sometimes duplicating the translation is the best thing you can do. Redundancy is good in localization: it allows to make fewer assumptions about translations into other languages. One of the core principles of L20n is that only the localizer will know what they really need.
Your solution is actually okay
The solution that you proposed is actually quite good. It's entirely possible that the emphasis that you're trying to express with <strong> will have some unknown implications in some language that we might not be aware of. For instance, some languages might use declensions or postpositions to mean "about something", in which case you—as a developer—shouldn't make too many assumptions about the exact position of the <strong> element. It might be that the entire translation will be a single word surrounded by <strong>.
Here's your code again, formatted using L20n's multiline string literals:
<aboutBrowser "About {{ browserBrandShortName }}">
<aboutBrowserEmphasized """
About <strong>{{ browserBrandShortName }}</strong>
""">
Note that for this to work as expected, you'll need to add a data-l10n-overlay attribute to the DOM node with data-l10n-id=aboutBrowserEmphasized. Otherwise, < and > will be escaped.
Making few assumptions matters
Let me digress quickly and bring up Bug 859035 — Do not use the same "unknown" entity for Size & Author when installing a WebApp from Firefox OS. The English-speaking developer assumed the they can use the "unknown" adjective to qualify both the size and the author in the installation dialog. However, in certain languages, like French or Polish, the adjective must be accorded with the object in terms of gender and plurals. So even though in English we can only have one string:
<unknown "Unknown">
…other languages might require two separate strings for each of the contexts they're used in. In French, you'd say "auteur inconnu" (unknown author, masculine) but "taille inconnue" (unknown size, feminine):
<unknownSize "inconnue">
<unknownAuthor "inconnu">
In English, this means some redundancy:
<unknownSize "Unknown">
<unknownAuthor "Unknown">
…but that's OK, because in the end the quality of localization is improved. It is generally a good practice to use unique strings everywhere and reuse sparingly. Ideally, you'd allow different translations for all strings. Even something as simple and common as "Yes" and "No" can be tricky if you consider languages like Welsh:
Welsh doesn't have a single word to use every time for yes and no questions. The word used depends on the form of the question. You must generally answer using the relevant form of the verb used in the question, or in questions where the verb is not the first element you use either 'ie' / 'nage'.

What's a good Lucene analyzer for text and source code?

What would be a good Lucene analyzer to use for documents that are a mix of text and diverse source code?
For example, I want "C" and "C++" to be considered different words, and I want Charset.forName("utf-8") to be split between the class name and method name, and for the parameter to be considered either one or two words.
A good example dataset for what I'd like to look at is StackOverflow itself. I believe that StackOverflow uses Lucene.NET for search; does it use a stock analyzer, or has it been heavily customized?
You're probably best to use the WhitespaceTokenizer and customize it to strip off punctuation. For example we strip of all puncutation except '+', '-' so that words such as C++, etc... are left but opening and closing quotes and brackets, etc are left. In reality though for something like this you might have to add the document twice using different tokenizers to catch the different parts of the document. i.e. once with the StandardTokenizer and once with a WhitespaceTokenizer, in this case the StandardTokenizer will split all your code, e.g. between class and method names as the Whitespace one will pick-up the words such as C++. Obviously it kind of depends on the language though as e.g. Scala allows some punctuation characters in method names.

Comparing instances of concepts in Semantic Web?

I am new to Semantic Web, and don't quite know what is the terminology for having instances of the same concepts or same inherited concepts? Can we call the instances equal if they belong to the same concept or subconcept?
Two instances of the same concept are in the same class. You can't really say anything more than than that. Suppose you have a concept Colour, and two instances red and green. They (presumably) aren't equal, but they are both members the Colour class, and may jointly be members of other classes as well (e.g. PrimaryColours, TrafficLightColours).
Note that I say that red and green may not be equal. In the semantic web, we generally make the open world assumption, i.e. that we don't assume that we have all of the relevant information yet, and we don't make the unique name assumption - so things with different names may denote the same thing. So unless red and green are explicitly stated to be different (owl:differentFrom), it's possible that, under the open world assumption, new information could show up to say, or infer, that they actually denote the same resource (owl:sameAs)
The equals method on a Jena Resource works out whether one resource is the same as another, not the same type as another. To work this out something like this will suffice:
if (resource1.hasProperty(model.createProperty("http://www.w3.org/1999/02/22-rdf-syntax-ns#", "type"), model.createResource("http://typeUri")) && resource2.hasProperty(model.createProperty("http://www.w3.org/1999/02/22-rdf-syntax-ns#", "type"), model.createResource("http://typeUri"))) {
// both resources are the same type
}

Is it good to generate dynamic keywords every time when page loaded?(SEO)

For an example generate 10 random keywords from web content.
Thanks..
UPDATED.
For SEO.
I'm assuming you mean for display ...
The only advantage I can think of is that search engines might possibly go "This page is updated frequently, we should check it more often", maybe. I'm not up enough on the latest search engine workings to say if this would actually work or not. I wouldn't trust it to.
Disadvantages depend on usage, but I can't picture any scenario where it's immensely helpful to be "random". If you better describe the reasoning that led you to this conclusion, we can tell you whether it's right or not. My gut feeling however is ... no. If you want to display summary data, then "random" shouldn't fit into the equation, or at least, not at the top level. You should first filter the content based on some useful criteria, then apply random at the last step if necessary.
Example Process:
Filter out words on the stop list (if, is, you, etc).
Count occurences of words, prefer words with high occurence counts.
Prefer words which aren't featured prominently in other content items.
If there are more than 10 words remaining, randomly select 10 from the better scorers.
Keywords for this post: I, of, is, it, to, but, you, on, we, if.