What characters are valid in DAML party names? - daml

I noticed that Navigator doesn't handle correctly party names that contain spaces. So I was wondering what other characters are illegal and where in the stack (Navigator, DAML-LF, etc.) the limitations apply. I couldn't find anything on this in the documentation. Can someone clarify?

There's something on this in the docs on the built-in primitive types, under Party:
The party text can only contain alphanumeric characters, -, _ and spaces.

Related

Adding special symbols into HTML

What is considered as best practice for adding special symbols into HTML? Using the symbol itself, for e.g. © or its code value ©?
Example 1:
<p>Qualcomm©<p>
Example 2:
<p>Qualcomm®<p>
Both have their pros and cons. There isn't a strongly established best practice.
Using a literal character:
Is easier to read
Doesn't require developers to remember the character reference code
Requires fewer bytes/characters to send over the network (or store in a database, which might be more significant).
Using a character reference:
May be easier to type (depending on the developer's keyboard)
Is immune to being screwed up by character encoding errors

How to use proper apostrophes in HTML instead of "dumb quotes"?

In the article Better web typography in a few simple steps, it says
Talking about apostrophes, the correct sign for them is the right single quotation mark. A dead give-away for amateur typography is the presence of straight quotation marks, also called 'dumb quotes' by type-savvy designers.
I've been using these "dumb quotes" all along!
Now, when one is writing regular HTML (and not Markdown, which automatically produces apostrophes), how is one supposed to sanely write correct apostrophes? Am I just supposed to inject ’ wherever a ' would go before? Is there a program that automatically does this?
How do professional web designers take care of this problem?
You have couple of options here:
As was pointed out before, either use numerical or named HTML entities.
Write your HTML with single apostrophes and then do a search and replace before publishing. This is workable, but could lead to unexpected replacements if you aren’t careful.
Insert the actual single quote using the appropriate keyboard sequence for your operating system: option-shift-] on a Mac or alt-0146 on a PC and make sure to save and serve your HTML as UTF-8 encoded. That way you don't have to screw around with entity names, but asumes a UTF-8 clean workflow.

What is the meaning of -{ }- in MediaWiki wikitext?

In my MediaWiki wiki, any wikitext containing -{ }- is not parsed correctly. Do I need some extension?
Example:
-{Computer}-
The -{}- syntax is used by the rather poorly documented (but widely used, at least in some regions!) MediaWiki automatic language variant conversion feature (LanguageConverter), which converts text between different writing systems and local variants of a language, such as between simplified and tradition Chinese characters, or between Cyrillic and Latin alphabets in Serbian language.
Specifically, -{}- is used to manually override the automatic conversion, either for literal text (such as names or quotations) that should not be converted, or for special cases where the automatic conversion gets it wrong and needs to be overridden. The syntax for the latter case looks something like -{var1: Some text; var2: Something else}-, possibly with some flags at the beginning that change the behavior in various ways.
Alas, short of reading the code itself, I was unable to find any comprehensive documentation on what all these flags and such actually do. I do believe that there's some decent documentation available if you can read Chinese, but I can't, and the output of Google Translate leaves much to be desired.

Is it advisable to have non-ascii characters in the URL?

We are currently working on a I18N project. I am wondering what are the complications of having the non-ascii characters in the URL. If its not advisable, what are the alternatives to deal with this problem?
EDIT (in response to Maxym's answer):
The site is going to be local to specific country and I need not worry about the world wide public accessing this site. I understand that from usability point of view, It is really annoying. What are the other technical problem associated with this?
It is possible to use non-ASCII/non-Latin domain names using IDNA. Further, you can always use percent encoding (like %20 for space) in URLs. RFC 3986 recommends UTF-8 encoding combined with percents:
the data should first be encoded as
octets according to the UTF-8
character encoding; then only those
octets that do not correspond to
characters in the unreserved set
should be percent-encoded. (...) For
example, the character A would be
represented as "A", the character
LATIN CAPITAL LETTER A WITH GRAVE
would be represented as "%C3%80", and
the character KATAKANA LETTER A would
be represented as "%E3%82%A2".
Modern clients (web browsers) are able to transform back and forth between percent encoding and Unicode, so the URL is transferred as ASCII but looks pretty for the user.
Make sure you're using a web framework/CMS that understands this encoding as well, to simplify URL input from webmasters/content editors.
I would say no. The reason is simple -> if you rely on world wide public, then it would be a big problem for people to type your url. I live in "cyrillic" world, it is possible to create cyrillic urls, but no one succeed with that, because even we are pretty lazy to change language and get used to type latin...
Update:
I can't say about alternatives, but sometimes some languages have informal or formal letter substitute, e.g. in German you can write Ö but in url you could see OE instead. Also you can consider english words, or words with similar sounds (so people from your country can remeber that writing, and other "countries" won't harm
depends on the target users... for example Nürnberg.de also looks at nuernberg.de for sake to make it easily accessible for native German user(as German keyboard is default and has all 4 extra key-symbols (öäüß) avaible to all German speakers), and do not forget that one of the goal I18N is to provide native language feel to the end user. Mac and Linux user have even more initiative way, like by clicking Alt+u on Mac will induce umlaut in characters to deal with I18N inputing.
I was just wondering what are the
complications of having the non-ascii
characters in the URL.
but the way you laid your question, it seems that your question is more around URI, rather then URL... and you are trying to fuse URN with non-ascii characters inside URI. there are no complications in it, if you know where and how to parse the your URN at server ( for example: in case of Django based server, the URN can be parsed and handled using regex inside url.py ).. all you need to keep in mind is that with web2.0( Ajax javascript based) evolution, everything mainly runs in utf-8, as Javascript specification demands utf-8 encoding. And thus utf-8 has evolving into a sort of standard. stick with utf-8 encoding specs, and you will hardly be facing any complications in URI parsing and working around it.
for example. check the URI http://de.wikipedia.org/wiki/Fürth or http://hi.wikipedia.org/wiki/जर्मनी .. irrespective of the encoding you write it in addressbar, browser will translate it to UTF-8, and send it to server.
NOTE : beside UTF-8, there are some symbols that are encoded using percentage encoding.. more about it can be located here...
http://en.wikipedia.org/wiki/Percent-encoding
You can use non-ascii characters in an url, but it's ugly because spécial caracters must be encoded like this:
http://www.w3schools.com/tags/ref_urlencode.asp

Syntax highlight design pattern

I'm looking for some good overviews of best practices and common patterns for enabling syntax highlighting in a textbox. It seems like a very common exercise almost all languages have a UI control that enables syntax highlighting in different languages. I'm just curious to see if there is a common pattern of implementation.
Is everyone using regular expressions? Is there a repository for regular expressions that are commonly used in syntax highlighting scenarios?
Are there alternative/better approaches to syntax highlighting?
Update
Links to relevant resources about performing syntax highlighting in a given language or concepts related to syntax highlighting would be great. Lexing (lexical analysis) was brought up in an answer but without a link to learn more. Anything to help better understand this commonly solved problem would be great.
Lexical Analysis on Wikipedia
Regular expressions are definitely the first place most start out at. However, they can't really cope with many edge cases that one meets in most languages - text that looks like keywords can be in found string literals, string literals in turn can contain escaped delimiters, as well as special characters. Same thing goes for comments, etc.
Basically to do a good job of syntax highlighting you need to perform lexing of the source - parsing it with the application of language-specific heuristics to build a list of regions, where each region of the source is annotated with how it is to be styled.
As edits take place, you can again apply language rules to see how far this change can alter the presentation of a region. For example typing a letter inside a string literal simply makes the string literal region longer, but typing a closing quote truncates the region and turns the leftover part of it into code, subject to all the other lexing rules.