View webpage in Hebrew language - html

I have a website with few HTML pages. How can I display them in the Hebrew language?
What are the steps that I should follow to ensure it is viewed in different languages for different countries?
Thanks

It would be quite helpful if this question is rewritten.
What I believe that you are trying to ask is how to detect the user's language.
You can do this through headers. Most browsers add on the BrowserLanguage header tag when you go to a page. [This is customized through the browser's properties]
If you are dealing with static HTML pages you would need to setup redirection with your server to display the language appropriate site.

You need to:
Be able to get content in that language (don't depend on automated translation services)
Select a character encoding (UTF-8 is generally the best choice), see http://www.w3.org/International/tutorials/tutorial-char-enc/
Ensure your editor saves using that encoding
Ensure your server specifies that it is using that encoding
Ensure that nothing mangles the encoding between the editor and server (such as by being inserted into a database that is configured to use a different encoding)

Related

How to specify a language across an entire website?

For context: I am not a web developer or programmer of any sort and I use a screen-reader.
I am using an accessibility evaluation tool across some sites I'd like to visit, which, however, have some accessibility issues.
From some googling, I see that it is fairly easy to specify a single webpages language by using <HTML lang='en'></html> for example for English.
How easy would it be to implement this across an entire site with hundreds of separate pages? This is a very vague question without linking to the specific website. For example, a car website with a separate page for individual cars with each page missing a specified language. Would it be fairly simple to specify the language across all automatically or would one have to manually copy/paste <HTML lang='en'></html> for each page? Or does it heavily depend on how the website was structured HTML-wise? etc.
Thank you
You can define the content language sitewide without modifying the HTML using the HTTP header:
Content-Language: de-DE
This is described in the WCAG : Specifying the default language in the HTTP header
This header should be used as fallback by browsers when no lang attribute is declared.
The HTML5 specification says that if there is no lang attribute on the html tag, and if there is no meta element with the http-equiv attribute set to Content-Language, and if there is only a single language tag in the HTTP header declaration, then a browser may use that information to guess at the default language of the text in the page.
But as noted in the WCAG, this might not be always sufficient:
Note that the Content-Language HTTP header does not serve to identify the language used for processing the content. The content processing language can be identified by means of other techniques, such as the attributes lang and xml:lang in markup languages.
The technique Using the language attribute on the HTML element must be applied and specifiying the lang attribute is something that you won't do an a per-webpage basis, but modifying the site template.
And yes, this is a programming question, a beginner one, but still a programming one
To answer the underlying programming question: Yes, usually changing one file is enough.
You can safely assume that the HTML files that make a website are generated by either a Static Site Generator in advance, or by Content Management System (CMS) on the server when a page is requested.
To generate these files, templates are used. Templates give a common structure and contain variables, which will be replaced by content from the respective page or component.
Usually, there is one main, high-level HTML template, which provides the basic HTML structure with the opening HTML tag <html>.
So usually it suffices to change only one file to include the language attribute, by writing <html lang="en"> in that template, or using a variable for a multi-lingual site <html lang="{{language}}"> (syntax depends on your generator).

Safe Way to Include User Text Input in HTML

This feels like an easy one, but I'm having trouble finding the right search terms to get me what I need...
I have a requirement for part of my web page to display a previously-entered note from a user. The note is saved in the database, and I am currently incorporating it using Razor like this:
<span>#Model.UserNote</span>
This works fine, but it gets my spidey senses tingling... what if the user decides that he wants his note to be something like "</span><script>...</script><span>". I know how to use parameters to avoid injection attacks in SQL Server, but is there an HTML equivalent or another approach to avoid saving or injecting malicious markup in HTML? Displaying the text in a control like a textbox feels safer, but may not give me the visual appearance that I am looking for. Thanks in advance!
The thing you want to search for is cross-site scripting (xss).
The general solution is to encode output according to its context. For example if you are writing such data into plain html, you need html encoding, which is basically replacing < with & lt; and so on in dynamic data (~user input), so that everything only gets rendered as text. For a javascript context (for example but not only inside a <script> tag) you would need javascript encoding.
In .net, there is HttpUtility that includes such methods, eg. HttpUtility.JavascriptStringEncode(). Also there is the formerly separate AntiXSS library that can help by providing even stricter (whitelist-based) encoding, as opposed to the blacklist-based HttpUtility. So don't roll your own, it's trickier than it may first appear - just use a well-known implementation.
Also Razor has built-in protection against trivial xss attack vectors. By using #myVar, Razor automatically applies html encoding, so your code above is secure. Note that it would not be secure in a javascript context, where you need to apply javascript encoding yourself (ie. call the relevant method from HttpUtility for instance).
Note that without proper encoding, it is not more secure to use an input field or a textarea - an injection is an injection, doesn't matter much what characters need to be used if injection is possible.
Also slightly related, .net provides another protection besides the automatic html encoding. It uses "request validation", and by default won't allow request parameters (either get or post) to contain a less than character (<), immediately followed by a letter. Such a request would be blocked by the framework as potentially unsafe, unless this feature is deliberately turned off.
Your original example is blocked by both of these mechanisms (automatic encoding and request validation).
It's very important to note though, that in terms of xss, this is the very tip of the iceberg. While these protections in .net help somewhat, they are by no means sufficient in general. While your example is secure, in general you need to understand xss and what exactly these protections do to be able to produce secure code.

How to display all non-English characters correctly in a web site?

It's annoying to see even the most professional sites do it wrong. Posted text turns into something that's unreadable. I don't have much information about encodings. I just want to know about the problem that's making such a basic thing so hard.
Does HTTP encoding limit some
characters?
Do users need to send info about the
charset/encoding they are using?
Assuming everything arrives to the
server as it is, is encoding used
saving that text causing the problem?
Is it something about browser
implementations?
Do we need some JavaScript tricks to
make it work?
Is there an absolute solution to this? It may have its limits but StackOverflow seems to make it work.
I suspect one needs to make sure that the whole stack handles the encoding with care:
Specify a web page font (CSS) that supports a wide range of international characters.
Specify a correct lang/charset HTML tag attributes and make sure that the Browser is using the correct encoding.
Make sure the HTTP requests are send with the appropriate charset specified in the headers.
Make sure the content of the HTTP requests is decoded properly in your web request handler
Configure your database/datastore with a internationalization-friendly encoding/Collation (such as UTF-9/UTF-16) and not one that just supports latin characters (default in some DBs).
The first few are normally handled by the browser and web framework of choice, but if you screw up the DB encoding or use a font with limited character set there will be no one to save you.

How will you customise a html page so that it accepts multiple language?

How will you customise a html page so that it accepts multiple language?
I will cite W3 Internationalization Quick Tips for the Web :
Encoding. Use Unicode wherever possible for content, databases, etc. Always declare the encoding of content.
Escapes. Use characters rather than escapes (e.g. á á or á) whenever you can.
Language. Declare the language of documents and indicate internal language changes.
Presentation vs. content. Use style sheets for presentational information. Restrict markup to semantics.
Images, animations & examples. Check for translatability and inappropriate cultural bias.
Forms. Use an appropriate encoding on both form and server. Support local formats of names/addresses, times/dates, etc.
Text authoring. Use simple, concise text. Use care when composing sentences from multiple strings.
Navigation. On each page include clearly visible navigation to localized pages or sites, using the target language.
Right-to-left text. For XHTML, add dir="rtl" to the html tag. Only re-use it to change the base direction.
Check your work. Validate! Use techniques, tutorials, and articles at http://www.w3.org/International/
For more information follow W3 recommendations : http://www.w3.org/International/
One way to do this would be to use a decent server-side web technology, there are many to choose from, which contains support for internationalization. Essentially it comes down to specifying the different pieces of text that the site needs to display, assigning a label to each message, creating different versions of each label in separate language files, and using the server-side code, reference the label name and a country code to display the text in the appropriate language.
The first step is to determine your requirements, your hosting environment and then figure out what options are available to you. If you can provide some more information we might be able to steer you in a better direction.
If I make a bunch of assumptions about what you are trying to achieve:
Serve the document as UTF-8
Browsers will tend to then return a UTF-8 response to the server when any forms are submitted (forms being the only way that a page is going to "accept" anything), and UTF-8 can handle the characters used in just about every language.

Best practices for internationalizing web applications?

Internationalizing web apps always seems to be a chore. No matter how much you plan ahead for pluggable languages, there's always issues with encoding, funky phrasing that doesn't fit your templates, and other problems.
I think it would be useful to get the SO community's input for a set of things that programmers should look out for when deciding to internationalize their web apps.
Internationalization is hard, here's a few things I've learned from working with 2 websites that were in over 20 different languages:
Use UTF-8 everywhere. No exceptions. HTML, server-side language (watch out for PHP especially), database, etc.
No text in images unless you want a ton of work. Use CSS to put text over images if necessary.
Separate configuration from localization. That way localizers can translate the text and you can deal with different configurations per locale (features, layout, etc). You don't want localizers to have the ability to mess with your app.
Make sure your layouts can deal with text that is 2-3 times longer than English. And also 50% less than English (Japanese and Chinese are often shorter).
Some languages need larger font sizes (Japanese, Chinese)
Colors are locale-specific also. Red and green don't mean the same thing everywhere!
Add a classname that is the locale name to the body tag of your documents. That way you can specify a specific locale's layout in your CSS file easily.
Watch out for variable substitution. Don't split your strings. Leave them whole like this: "You have X new messages" and replace the 'X' with the #.
Different languages have different pluralization. 0, 1, 2-4, 5-7, 7-infinity. Hard to deal with.
Context is difficult. Sometimes localizers need to know where/how a string is used to make sure it's translated correctly.
Resources:
http://interglacial.com/~sburke/tpj/as_html/tpj13.html
http://www.ryandoherty.net/2008/05/26/quick-tips-for-localizing-web-apps/
http://ed.agadak.net/2007/12/one-potato-two-potato-three-potato-four
In my company all our strings are stored in *.properties files. Our build tools build a "test languange" copy of the properties files, which replace a string like this:
Click here
with something like this:
[~~ Çļïčк н∑ѓё ~~ タウ ~~]
Now, when we set the language to "test" in our config files, these properties files are used. (And of course we don't ship the test language files).
This allows us to:
Make sure that Unicode characters are displayed correctly, including Japanese/Chinese/Korean.
Make sure that the layout scales appropriately for languages with longer words (German in particular has longer words on average than English).
Spot any hard-coded strings (as they will be in plain-English).
As for the actual translation, this is done by professional translators, not developers.
As an English person living abroad I have become frustrated by many web application's approach to internationalization and have blogged about my frustrations.
My tips would be:
think about how you show an international version of a page
using geolocation might work for many users, but as my examples show for many it will not
why not use the Accept-Language header for determining which language to serve
if a user accesses a page via a search engine then don't redirect them somewhere else e.g. to a homepage in a different language
it's extremely annoying to change language and have a different page reload - either serve the same page or warn the user that the current content is not available in a different language before redirecting them
English is a very common language, so perhaps default to that
But make sure the change language option is clear on the GUI (I like what Google Maps are doing, as shown in the post)
All I see on the Web is companies getting internalization wrong. Getting it right from a user's perspective is tricky indeed.
I have a couple apps that are "bilingual"
I used resource files in ASP.NET1.1
There is also something called the String Resource Tool
Basically you put all your strings in a .RES file for both languages and then determine what file to read from based on Culture or whether someone clicked a Link for the language
The biggest gotcha is making sure the Translations are done correctly