Multi-language Translation on Web Site - html

i want to know how did facebook do it, did they add two different html+css or its just css only for changing there theme for different language or is there any special html attribute that changes the direction for complete site? following are two example, one english and other is arabic.
also i want to add another related question, which is, do you think they translated using some api like google api or did they hard code the translation (hiring someone to do the translation)?
Example picture 1
Example picture 2

Depending on what technology you're using, this concept is known as string externalizing, string resourcing, string internationalization, localization etc. It is possible to do it all in CSS+Javascript, but that wouldn't be a very efficient way to go about doing things, especially if your site had a lot of strings and a lot of translations.
The HTML is different - just look at the HTML source if you're curious. The source is different because the in the code behind the website's front end, strings like "Login" are stored externally in a collection file that might look something like this:
## LANGUAGE = ENGLISH ##
LOGIN = "Login"
PASSWORD = "Password"
When you switch languages, the code behind the front end remains the same, but the code then uses a different external language file. For example, might be the Spanish file for the same application:
## LANGUAGE = SPANISH ##
LOGIN = "Iniciar sesión"
PASSWORD = "contraseña"
The idea is that in order to support new languages, all that needs to be done is to have the original identifier translated into a new language file. The translator doesn't have to be a programmer to translate the above snippit easily.
The final comment is that Facebook has enough money to pay professional translators to provide them with very good translations in many world languages. A long time ago, they allowed users to submit translations as a starting point. It generally is a bad idea to use a free translation API to translate application strings, because most of the time those APIs will not get the grammar correct. Translation APIs are most effective at getting the "overall meaning" of some words and phrases right, but it can also be terribly inaccurate at getting the most-correct word translation for any one particular idiom.

The layout change facebook did is with css changes, they have two separate css, one for right to left and other is left to right, but if there was another language inside english or same vice verse , then they literally use html direction tag, to direct the message in box with right direction.

Related

How can I get all the ACM Computing Classification System tags as strings?

So I'm writing an app where there's a HTML tag which should have all the ACM CCS tags/fields as options and basically users should be able to select one or more of these tags and assign them.
However how can I go about getting all the ACM CCS tags as strings in the first place, so I can create the element with the proper options in the first place?
Getting them manually isn't really an option as there are probably thousands of them by the looks of it.
Considering they're part of the academic culture I thought they'd already been extracted at some point by someone in a certain format and then shared somewhere but I couldn't find any such thing.
For reference, if you don't know what I'm talking about, I want all the names of all the CCS tags in no particular order from this site: https://dl.acm.org/ccs
So for example, in the end I wanna obtain such strings as "General and reference", "Hardware" , etc.
I've looked into web crawlers and the likes but I seriously have no clue how to even use one, let alone build one. As far as I've seen all these names are stored inside <li> tags so if I had something that'd "parse" the whole site with all its pages and then store the content of all the <li> tags it encounters as strings that would work.

Ultimate Website Testing String

I've been grappling with the fraught area of escaping user (text) input for web pages. The ultimate goal is to have user input displayed and stored exactly as typed in, without breaking anything.
To that end I have been using the following test string :
'"_$%^&*()+=-£{}[]/n/<>\#~;|,.?#:!&``"'
It seems to work well (even Stack Overflow or Twitter is not immune, hence the back ticks). My question is, will this string capture most escaping problems, for example going from a web page via Ajax and to a database and back again?
In fact how do I display this string in Stack Overflow without the back ticks?
Is there a better one, e.g. say one that will highlight encoding problems too?
When I'm testing, I'm using something like this
a’b<’>",!"/%$?$&?%(()%/"!"/&?%$/"&$/"?%&?-f¯Ñ112üêù
This is generally sufficient to highlight encoding issues, at least from what I can see.
Including a mathematical symbol such as unicode x2202 might be useful too.
That seems like it should be all of them. The smartest thing to do would be to (depending on the language you're using) use a library that has been well tested, that can sanitize user input. Just ask around what other websites use.
See here: http://gendoh.com/2511063
The post itself is written in Korean, but you could see what makes difference between several given patterns. (V1 to V3 are for generic web apps while V4 and V5 is for javascripts.)

Why do I need Markdown?

Why do I need a Markdown with a front edit editor like WMD? What does the markdown do to the content that’s sent from the WMD editor?
How does Markdown store the content in the backend? Is it the same way like *bold* or in some other format? Why can’t I just do an html encode?
Sorry if I sounded very naïve.

			
				
It's probably helpful to take a step back and ask some of the larger questions. The issue Markdown is trying to solve is that of rich editing in the browser. Consider this: At some point, for any piece of software to enable rich text it has to describe the richness in a some manner, however that may be.
We could call that description of richness (by description of richness I mean like "this bit of text is bold" or "this bit of text is a hyperlink), we could call that description of richness "markup" -- it marks up the text with meta "richness".
Implementations of rich text can take on two approaches, either a.) hide the markup from the user or b.) let them have access to the markup.
For those who choose to hide it, the end result is very often WYSIWYG. The user is oblivious to what is happening behind the scenes. The editor takes care of the details. Think MS Word as an example. No one manipulates the Word markup format as a regular end user.
For implementations which choose to expose the markup, a markup language is then in order to allow users to interacat with it. Such markup languages would be things like HTML doing <tag> or BB code for example, doing things like [tag].
Markdown is one such of these languages.
As opposed to the former types I mentioned, Markdown has tried to design itself so that the markup renders common ASCII people already use. For example, it's common for people to asterisk their text to set it off, *important*, and this notation in Markdown is an indicator of italic.
In regards to storage, as Stephan pointed out, the system will most likely store the raw markdown, because the user will most likely need to have the possibility of editing, and the original markdown can be recalled for that purpose.
In most of the systems I've built, I store the markdown, and then normalize it to a 2nd field which caches the HTML rendering of the markdown. This way I don't have to do markdown->HTML rendering for every markdown field. It takes a little more space, but I'd rather the user have a faster response than use less DB storage space.
Care should also be taken when accepting Markdown from the browser, as it can easily contain <script> tags which need to be filtered out. Most markdown implementations will also recognize HTML intermingled with Markdown formatting, as so to be safe, you need to make sure your inputs and caches are sanitized properly.
The reason for using an alternate encoding system other than HTML is for security
Markdown and other such wiki style encoding systems do not usually support scripting languages
HTML supports scripting languages in many ways (
The two main security issues are:
Malware criminals use scripts in user generated content to attempt malware actions on the content readers computer by scripting to access known security holes
Free loaders using scripts to subvert the rest of the site by changing the content frame or styles i.e. ads, menu's, logos etc. This can also be criminal behaviour if not just annoying
By using an intermediate language such as Markdown you have total control on the rendered output
Filtering HTML is possible, but is also complex and risky
The other significant reason for an alternate encoding system is enforcement of style. Normal HTML has too many options. By limiting the available options, users can only use certain styles. The usually makes for cleaner looking and more readable content (compare SO to Ebay)
The main reason for using Markdown is the readability of a marked text. For instance, you can send it in a plain-text email and the reader will still understand the emphiasis, bullets, the text will be divided in paragraphs et cetera.
When you ask about storing data, it depends. If you enable Markdown in the WordPress blog engine, it stores data as the user has input it - in Markdown. In Stack Overflow, however, it seems like the data is stored as HTML. At least, the "Stack Overflow data dumps" contain HTML, not Markdown (I've seen people complaining) that they have to convert it back).
If you use the WMD editor, you can show the user how the outputs will look like after being converted to HTML. Even though Markdown syntax is really simple, it is not hard to make mistakes. Hence, it is best to show users the output.
Another reason for using Markdown instead of a WYSIWIG control - a WYSIWIG control allows the user to use HTML in data you are displaying on your web page. So, you have to be the one who decides when there is simply incorrect HTML and when it is an evil XSS/CSRF/whatever injection. In Markdown, you simply convert *something* to <b>something</b>, remove any unknow HTML elements and you're done.

How do you handle translation of text with markup?

I'm developing multi-language support for our web app. We're using Django's helpers around the gettext library. Everything has been surprisingly easy, except for the question of how to handle sentences that include significant HTML markup. Here's a simple example:
Please log in to continue.
Here are the approaches I can think of:
Change the link to include the whole sentence. Regardless of whether the change is a good idea in this case, the problem with this solution is that UI becomes dependent on the needs of i18n when the two are ideally independent.
Mark the whole string above for translation (formatting included). The translation strings would then also include the HTML directly. The problem with this is that changing the HTML formatting requires changing all the translation.
Tightly couple multiple translations, then use string interpolation to combine them. For the example, the phrase "Please %s to continue" and "log in" could be marked separately for translation, then combined. The "log in" is localized, then wrapped in the HREF, then inserted into the translated phrase, which keeps the %s in translation to mark where the link should go. This approach complicates the code and breaks the independence of translation strings.
Are there any other options? How have others solved this problem?
Solution 2 is what you want. Send them the whole sentence, with the HTML markup embedded.
Reasons:
The predominant translation tool, Trados, can preserve the markup from inadvertent corruption by a translator.
Trados can also auto-translate text that it has seen before, even if the content of the tags have changed (but the number of tags and their position in the sentence are the same). At the very least, the translator will give you a good discount.
Styling is locale-specific. In some cases, bold will be inappropriate in Chinese or Japanese, and italics are less commonly used in East Asian languages, for example. The translator should have the freedom to either keep or remove the styles.
Word order is language-specific. If you were to segment the above sentence into fragments, it might work for English and French, but in Chinese or Japanese the word order would not be correct when you concatenate. For this reason, it is best i18n practice to externalize entire sentences, not sentence fragments.
2, with a potential twist.
You certainly could localize the whole string, like:
loginLink=Please log in to continue
However, depending on your tooling and your localization group, they might prefer for you to do something like:
// tokens in this string add html links
loginLink=Please {0}log in{1} to continue
That would be my preferred method. You could use a different substitution pattern if you have localization tooling that ignores certain characters. E.g.
loginLink=Please %startlink%log in%endlink% to continue
Then perform the substitution in your jsp, servlet, or equivalent for whatever language you're using ...
Disclaimer: I am not experienced in internationalization of software myself.
I don't think this would be good in any case - just introduces too much coupling …
As long as you keep formatting sparse in the parts which need to be translated, this could be okay. Giving translators the possibility to give special words importance (by either making them a link or probably using <strong /> emphasis sounds like a good idea. However, those translations with (X)HTML possibly cannot be used anywhere else easily.
This sounds like unnecessary work to me …
If it were me, I think I would go with the second approach, but I would put the URI into a formatting parameter, so that this can be changed without having to change all those translations.
Please log in to continue.
You should keep in mind that you may need to teach your translators a basic knowledge of (X)HTML if you go with this approach, so that they do not screw up your markup and so that they know what to expect from that text they write. Anyhow, this additional knowledge might lead to a better semantic markup, because, as mentioned above, texts could be translated and annotated with (X)HTML to reflect local writing style.
What ever you do keep the whole sentence as one string. You need to understand the whole sentece to translate it correctly.
Not all words should be translated in all languages: e.g. in Norwegian one doesn't use "please" (we can say "vær så snill" literally "be so kind" but when used as a command it sounds too forceful) so the correct norwegian vould be:
"Logg inn for å fortsette" lit.: "Log in to continue" or
"Fortsett ved å logge inn" lit.: "Continue by to log in" etc.
You must allow completely changing the order, e.g. in a fictional demo language:
"Für kontinuer Loggen bitte ins" (if it was real) lit.: "To continue log please in"
Some language may even have one single word for (most of) this sentence too...
I'll recommend solution 1 or possibly "Please %{startlink}log in%{endlink} to continue" this way the translator can make the whole sentence a link if that's more natural, and it can be completely restructured.
Interesting question, I'll be having this problem very soon. I think I'll go for 2, without any kind of tricky stuff. HTML markup is simple, urls won't move anytime soon, and if anything is changed a new entry will be created in django.po, so we get a chance to review the translation ( ex: a script should check for empty translations after makemessages ).
So, in template :
{% load i18n %}
{% trans 'hello world' %}
... then, after python manage.py makemessages I get in my django.po
#: templates/out.html:3
msgid "hello world"
msgstr ""
I change it to my needs
#: templates/out.html:3
msgid "hello world"
msgstr "bonjour monde"
... and in the simple yet frequent cases I'll encounter, it won't be worth any further trouble. The other solutions here seems quite smart but I don't think the solution to markup problems is more markup. Plus, I want to avoid too much confusing stuff inside templates.
Your templates should be quite stable after a while, I guess, but I don't know what other trouble you expect. If the content changes over and over, perhaps that content's place is not inside the template but inside a model.
Edit: I just checked it out in the documentation, if you ever need variables inside a translation, there is blocktrans.
Makes no sense, how would you translate "log in"?
I don't think many translators have experience with HTML (the regular non-HTML-aware translators would be cheaper)
I would go with option 3, or use "Please %slog in%s to continue" and replace the %s with parts of the link.

Best practices for internationalizing web applications?

Internationalizing web apps always seems to be a chore. No matter how much you plan ahead for pluggable languages, there's always issues with encoding, funky phrasing that doesn't fit your templates, and other problems.
I think it would be useful to get the SO community's input for a set of things that programmers should look out for when deciding to internationalize their web apps.
Internationalization is hard, here's a few things I've learned from working with 2 websites that were in over 20 different languages:
Use UTF-8 everywhere. No exceptions. HTML, server-side language (watch out for PHP especially), database, etc.
No text in images unless you want a ton of work. Use CSS to put text over images if necessary.
Separate configuration from localization. That way localizers can translate the text and you can deal with different configurations per locale (features, layout, etc). You don't want localizers to have the ability to mess with your app.
Make sure your layouts can deal with text that is 2-3 times longer than English. And also 50% less than English (Japanese and Chinese are often shorter).
Some languages need larger font sizes (Japanese, Chinese)
Colors are locale-specific also. Red and green don't mean the same thing everywhere!
Add a classname that is the locale name to the body tag of your documents. That way you can specify a specific locale's layout in your CSS file easily.
Watch out for variable substitution. Don't split your strings. Leave them whole like this: "You have X new messages" and replace the 'X' with the #.
Different languages have different pluralization. 0, 1, 2-4, 5-7, 7-infinity. Hard to deal with.
Context is difficult. Sometimes localizers need to know where/how a string is used to make sure it's translated correctly.
Resources:
http://interglacial.com/~sburke/tpj/as_html/tpj13.html
http://www.ryandoherty.net/2008/05/26/quick-tips-for-localizing-web-apps/
http://ed.agadak.net/2007/12/one-potato-two-potato-three-potato-four
In my company all our strings are stored in *.properties files. Our build tools build a "test languange" copy of the properties files, which replace a string like this:
Click here
with something like this:
[~~ Çļïčк н∑ѓё ~~ タウ ~~]
Now, when we set the language to "test" in our config files, these properties files are used. (And of course we don't ship the test language files).
This allows us to:
Make sure that Unicode characters are displayed correctly, including Japanese/Chinese/Korean.
Make sure that the layout scales appropriately for languages with longer words (German in particular has longer words on average than English).
Spot any hard-coded strings (as they will be in plain-English).
As for the actual translation, this is done by professional translators, not developers.
As an English person living abroad I have become frustrated by many web application's approach to internationalization and have blogged about my frustrations.
My tips would be:
think about how you show an international version of a page
using geolocation might work for many users, but as my examples show for many it will not
why not use the Accept-Language header for determining which language to serve
if a user accesses a page via a search engine then don't redirect them somewhere else e.g. to a homepage in a different language
it's extremely annoying to change language and have a different page reload - either serve the same page or warn the user that the current content is not available in a different language before redirecting them
English is a very common language, so perhaps default to that
But make sure the change language option is clear on the GUI (I like what Google Maps are doing, as shown in the post)
All I see on the Web is companies getting internalization wrong. Getting it right from a user's perspective is tricky indeed.
I have a couple apps that are "bilingual"
I used resource files in ASP.NET1.1
There is also something called the String Resource Tool
Basically you put all your strings in a .RES file for both languages and then determine what file to read from based on Culture or whether someone clicked a Link for the language
The biggest gotcha is making sure the Translations are done correctly