MediaWiki get first letter of a string

MediaWiki get first letter of a string - mediawiki

I want to make a function that will choose "a" or "an" depending on the noun. I am getting that noun from {{PAGENAME}}, which will have Opal, Emerald, Diamond etc.
Is there a parser function or something that I can use to get the first letter? Will also accept extensions or templates that do the same thing.
(I know in English that a word starting with a vowel doesn't always use "an", but for my purposes this doesn't concern me much)
Thanks!

See Template:A or an on English Wikipedia. Note that it relies on other templates and modules that also need to be created on your wiki, which means you'll need the Scribunto extension.

Related

What's a good Lucene analyzer for text and source code?

What would be a good Lucene analyzer to use for documents that are a mix of text and diverse source code?
For example, I want "C" and "C++" to be considered different words, and I want Charset.forName("utf-8") to be split between the class name and method name, and for the parameter to be considered either one or two words.
A good example dataset for what I'd like to look at is StackOverflow itself. I believe that StackOverflow uses Lucene.NET for search; does it use a stock analyzer, or has it been heavily customized?

You're probably best to use the WhitespaceTokenizer and customize it to strip off punctuation. For example we strip of all puncutation except '+', '-' so that words such as C++, etc... are left but opening and closing quotes and brackets, etc are left. In reality though for something like this you might have to add the document twice using different tokenizers to catch the different parts of the document. i.e. once with the StandardTokenizer and once with a WhitespaceTokenizer, in this case the StandardTokenizer will split all your code, e.g. between class and method names as the Whitespace one will pick-up the words such as C++. Obviously it kind of depends on the language though as e.g. Scala allows some punctuation characters in method names.

which is better to add two names (-) or(_)

hi when i write css or html i found that i want add two name like this
web-development
web_development
which one is better according SEO or write style name, file name or image name.

The first one is better. Also see this post by Google employee Matt Cutts: http://www.mattcutts.com/blog/dashes-vs-underscores/

use the dash. Google engines don't really parse underscores. This is maybe for programmers sanity, so that when they search for query_function, they get results they are looking for?
If you have a url like "http://example.com/web-site", google will return results for 'web', 'site' and '"web site"'. This is not the case for underscores: web_site will only return results for web_site.
ps.
I also think that dashes are better than underscores for usability purposes: a dash is a single button on the keyboard, while an underscore requires two buttons to be pressed. This has nothing to do with the technical side of SEO, but everything to do with usability, which is more important than SEO imo.

for css i don't think there is some issues with naming methodology, but for naming HTML pages - is preferred as search engines take - as space, even though good page name is not enough for good s.e.o. you need to have proper meta tag and keywords.
And make sure all your images have proper title tag, this is real essential.

Isn't it common practice to use the - to connect two words, and the _ to replace a space in situations where you can't use a space/+ sign, like CSS classNames?

first one is better in terms of SEO. Because the priority of hiphen is greater than under score

Please list two (2) words in the English language that use underscores ("_") within them.
Now list fifty (50) words that use dashes/hyphens ("-").
My opinion is that the hyphens would be a better solution for SEO.
IMO When it comes down to SEO is that everything makes a difference !

You are dealing with two different problems: URLs and CSS.
For URLs, hyphens would be the better choice because of SEO.
However, depending on your editing program, underscores might work better for mutli-word class names. In TextMate for instance, I can hit Esc to finish (auto-complete) a class I previously entered. It stops completing when it encounters a hyphen, but will fill in the whole class name when you use an underscore. If this is not the case for your editor, then it is really up to your preference.

change mediawiki namespaces?

Is there a way to reassign or change the core namespaces of mediawiki? For example, I'm having difficulty linking to a page I want to call "Template" because mediawiki has a namespace already for Template. I'd like to re-assign the mediawiki "template" namespace to something else.
Any thoughts?

You can partly change the name, under localsettings.php, add extra namespaces with the code number of template like adding main as 0, main:Main Page will redirect to Main Page

Basically, no.
You can change the display names of the namespaces in Special:AllMessages, and you can make aliases for the namespaces with $wgNamespaceAliases, but I don't think you can actually change the underlying names.
For example, to go to the talk page for Stack Overflow on the german wikipedia you can use http://de.wikipedia.org/wiki/Talk:Stack_Overflow or http://de.wikipedia.org/wiki/Diskussion:Stack_Overflow and they both take you to the same place.
BUT: The english wikipedia there is a page called Template and I just tested by making a page called Template on my wiki with no problems. So maybe it isn't the template namespace interfering. When I made a link to Template on my wikipedia userpage with [[Template]] it linked to the article Template, not to the name space.

I would advise very strongly against changing the names of any standard namespace. A name like "template" is so generic that surely you can find something else. For instance if you want to store code for C++ templates, call the namespace "Cpp_template" or "template_code".
Nothing in mediawiki prevents you from just using a colon as a prefix in a name, giving you exactly the same syntax as "supported" namespaces. I use that often. If it becomes helpful to differentiate those namespaces, e.g. for searching, then yes you can support them by editing LocalSettings.php (get the capitalization right folks, it matters in English, Linux shells and mediawiki).
For instance if I want to mark out a term as potentially problematic or biased, I use a "term:" prefix, for things like "term:Make_America_Great_Again" or "term:MAGA" which means the exact opposite of what its proponents claim. If I want to mark out verb phrases used in a user interface, it's "verb:delete" etc. For most of these it's not actually necessary to "support" the namespace.
You should ruthlessly customize namespaces and (even more so) categories for your task and purposes. If you are copying categories from some other wiki, you are probably doing it wrong. If you have not created any custom namespaces at least in the informal way I suggest, you are again probably doing it wrong.
But renaming standard namespaces or pages is a bad idea. You can use redirects for some of the same purposes. For instance, some "Special" pages have confusing and semantically inappropriate names that don't fit Wikipedia's or English language conventions. So I always redirect things like "Special:Wantedpages" with a new name like "open_links_in_this_wiki" which is semantically exact and doesn't tell people to just go create "wanted" pages with garbage in them instead of waiting to figure out if the name is appropriate or converging.
As a general rule, if you can't use the name easily in an English language sentence and [[just put brackets around it]], you need a redirect or rename.

How do you handle translation of text with markup?

I'm developing multi-language support for our web app. We're using Django's helpers around the gettext library. Everything has been surprisingly easy, except for the question of how to handle sentences that include significant HTML markup. Here's a simple example:
Please log in to continue.
Here are the approaches I can think of:
Change the link to include the whole sentence. Regardless of whether the change is a good idea in this case, the problem with this solution is that UI becomes dependent on the needs of i18n when the two are ideally independent.
Mark the whole string above for translation (formatting included). The translation strings would then also include the HTML directly. The problem with this is that changing the HTML formatting requires changing all the translation.
Tightly couple multiple translations, then use string interpolation to combine them. For the example, the phrase "Please %s to continue" and "log in" could be marked separately for translation, then combined. The "log in" is localized, then wrapped in the HREF, then inserted into the translated phrase, which keeps the %s in translation to mark where the link should go. This approach complicates the code and breaks the independence of translation strings.
Are there any other options? How have others solved this problem?

Solution 2 is what you want. Send them the whole sentence, with the HTML markup embedded.
Reasons:
The predominant translation tool, Trados, can preserve the markup from inadvertent corruption by a translator.
Trados can also auto-translate text that it has seen before, even if the content of the tags have changed (but the number of tags and their position in the sentence are the same). At the very least, the translator will give you a good discount.
Styling is locale-specific. In some cases, bold will be inappropriate in Chinese or Japanese, and italics are less commonly used in East Asian languages, for example. The translator should have the freedom to either keep or remove the styles.
Word order is language-specific. If you were to segment the above sentence into fragments, it might work for English and French, but in Chinese or Japanese the word order would not be correct when you concatenate. For this reason, it is best i18n practice to externalize entire sentences, not sentence fragments.

2, with a potential twist.
You certainly could localize the whole string, like:
loginLink=Please log in to continue
However, depending on your tooling and your localization group, they might prefer for you to do something like:
// tokens in this string add html links
loginLink=Please {0}log in{1} to continue
That would be my preferred method. You could use a different substitution pattern if you have localization tooling that ignores certain characters. E.g.
loginLink=Please %startlink%log in%endlink% to continue
Then perform the substitution in your jsp, servlet, or equivalent for whatever language you're using ...

Disclaimer: I am not experienced in internationalization of software myself.
I don't think this would be good in any case - just introduces too much coupling …
As long as you keep formatting sparse in the parts which need to be translated, this could be okay. Giving translators the possibility to give special words importance (by either making them a link or probably using <strong /> emphasis sounds like a good idea. However, those translations with (X)HTML possibly cannot be used anywhere else easily.
This sounds like unnecessary work to me …
If it were me, I think I would go with the second approach, but I would put the URI into a formatting parameter, so that this can be changed without having to change all those translations.
Please log in to continue.
You should keep in mind that you may need to teach your translators a basic knowledge of (X)HTML if you go with this approach, so that they do not screw up your markup and so that they know what to expect from that text they write. Anyhow, this additional knowledge might lead to a better semantic markup, because, as mentioned above, texts could be translated and annotated with (X)HTML to reflect local writing style.

What ever you do keep the whole sentence as one string. You need to understand the whole sentece to translate it correctly.
Not all words should be translated in all languages: e.g. in Norwegian one doesn't use "please" (we can say "vær så snill" literally "be so kind" but when used as a command it sounds too forceful) so the correct norwegian vould be:
"Logg inn for å fortsette" lit.: "Log in to continue" or
"Fortsett ved å logge inn" lit.: "Continue by to log in" etc.
You must allow completely changing the order, e.g. in a fictional demo language:
"Für kontinuer Loggen bitte ins" (if it was real) lit.: "To continue log please in"
Some language may even have one single word for (most of) this sentence too...
I'll recommend solution 1 or possibly "Please %{startlink}log in%{endlink} to continue" this way the translator can make the whole sentence a link if that's more natural, and it can be completely restructured.

Interesting question, I'll be having this problem very soon. I think I'll go for 2, without any kind of tricky stuff. HTML markup is simple, urls won't move anytime soon, and if anything is changed a new entry will be created in django.po, so we get a chance to review the translation ( ex: a script should check for empty translations after makemessages ).
So, in template :
{% load i18n %}
{% trans 'hello world' %}
... then, after python manage.py makemessages I get in my django.po
#: templates/out.html:3
msgid "hello world"
msgstr ""
I change it to my needs
#: templates/out.html:3
msgid "hello world"
msgstr "bonjour monde"
... and in the simple yet frequent cases I'll encounter, it won't be worth any further trouble. The other solutions here seems quite smart but I don't think the solution to markup problems is more markup. Plus, I want to avoid too much confusing stuff inside templates.
Your templates should be quite stable after a while, I guess, but I don't know what other trouble you expect. If the content changes over and over, perhaps that content's place is not inside the template but inside a model.
Edit: I just checked it out in the documentation, if you ever need variables inside a translation, there is blocktrans.

Makes no sense, how would you translate "log in"?
I don't think many translators have experience with HTML (the regular non-HTML-aware translators would be cheaper)
I would go with option 3, or use "Please %slog in%s to continue" and replace the %s with parts of the link.

Best practices for internationalizing web applications?

Internationalizing web apps always seems to be a chore. No matter how much you plan ahead for pluggable languages, there's always issues with encoding, funky phrasing that doesn't fit your templates, and other problems.
I think it would be useful to get the SO community's input for a set of things that programmers should look out for when deciding to internationalize their web apps.

Internationalization is hard, here's a few things I've learned from working with 2 websites that were in over 20 different languages:
Use UTF-8 everywhere. No exceptions. HTML, server-side language (watch out for PHP especially), database, etc.
No text in images unless you want a ton of work. Use CSS to put text over images if necessary.
Separate configuration from localization. That way localizers can translate the text and you can deal with different configurations per locale (features, layout, etc). You don't want localizers to have the ability to mess with your app.
Make sure your layouts can deal with text that is 2-3 times longer than English. And also 50% less than English (Japanese and Chinese are often shorter).
Some languages need larger font sizes (Japanese, Chinese)
Colors are locale-specific also. Red and green don't mean the same thing everywhere!
Add a classname that is the locale name to the body tag of your documents. That way you can specify a specific locale's layout in your CSS file easily.
Watch out for variable substitution. Don't split your strings. Leave them whole like this: "You have X new messages" and replace the 'X' with the #.
Different languages have different pluralization. 0, 1, 2-4, 5-7, 7-infinity. Hard to deal with.
Context is difficult. Sometimes localizers need to know where/how a string is used to make sure it's translated correctly.
Resources:
http://interglacial.com/~sburke/tpj/as_html/tpj13.html
http://www.ryandoherty.net/2008/05/26/quick-tips-for-localizing-web-apps/
http://ed.agadak.net/2007/12/one-potato-two-potato-three-potato-four

In my company all our strings are stored in *.properties files. Our build tools build a "test languange" copy of the properties files, which replace a string like this:
Click here
with something like this:
[~~ Çļïčк н∑ѓё ~~ ﾀｳ ~~]
Now, when we set the language to "test" in our config files, these properties files are used. (And of course we don't ship the test language files).
This allows us to:
Make sure that Unicode characters are displayed correctly, including Japanese/Chinese/Korean.
Make sure that the layout scales appropriately for languages with longer words (German in particular has longer words on average than English).
Spot any hard-coded strings (as they will be in plain-English).
As for the actual translation, this is done by professional translators, not developers.

As an English person living abroad I have become frustrated by many web application's approach to internationalization and have blogged about my frustrations.
My tips would be:
think about how you show an international version of a page
using geolocation might work for many users, but as my examples show for many it will not
why not use the Accept-Language header for determining which language to serve
if a user accesses a page via a search engine then don't redirect them somewhere else e.g. to a homepage in a different language
it's extremely annoying to change language and have a different page reload - either serve the same page or warn the user that the current content is not available in a different language before redirecting them
English is a very common language, so perhaps default to that
But make sure the change language option is clear on the GUI (I like what Google Maps are doing, as shown in the post)
All I see on the Web is companies getting internalization wrong. Getting it right from a user's perspective is tricky indeed.

I have a couple apps that are "bilingual"
I used resource files in ASP.NET1.1
There is also something called the String Resource Tool
Basically you put all your strings in a .RES file for both languages and then determine what file to read from based on Culture or whether someone clicked a Link for the language
The biggest gotcha is making sure the Translations are done correctly

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008