CSS: text-transform not working properly for Turkish characters - html

The implementations of the major browsers seem to have problems with text-transform: uppercase with Turkish characters. As far as I know (I'm not Turkish.) there are four different i characters: ı i I İ where the last two are the uppercase representations of the former two.
However applying text-transform:uppercase to ı i, the browsers (checked IE, Firefox, Chrome and Safari) results in I I which is not correct and may change the meaning of the words so much so that they become insults. (That's what I've been told)
As my research for solutions did not reveal any my question is: Are there workarounds for this issue? The first workaround might be to remove text-transform: uppercase entirely but that's some sort of last resort.
Funny thing, the W3C has tests for this problem on their site, but lack of further information about this issue. http://www.w3.org/International/tests/tests-html-css/tests-text-transform/generate?test=5
I appreciate any help and looking forward to your answers :-)
Here's a codepen

You can add lang attribute and set its value to tr to solve this:
<html lang="tr"> or <div lang="tr">
Here is working example.

Here's a quick and dirty workaround example - it's faster than I thought (tested in a document with 2400 tags -> no delay). But I see that js workarounds are not the very best solution
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-3">
</head>
<body>
<div style="text-transform:uppercase">a b c ç d e f g ğ h ı i j k l m n o ö p r s ş t u ü v y z (source)</div> <div>A B C Ç D E F G Ğ H I İ J K L M N O Ö P R S Ş T U Ü V Y Z (should be like this)</div>
<script>
function getStyle(element, style) {
var result;
if (document.defaultView && document.defaultView.getComputedStyle) {
result = document.defaultView.getComputedStyle(element, '').getPropertyValue(style);
} else if(element.currentStyle) {
style = style.replace(/\-(\w)/g, function (strMatch, p1) {
return p1.toUpperCase();
});
result = element.currentStyle[style];
}
return result;
}
function replaceRecursive(element) {
if (element && element.style && getStyle(element, 'text-transform') == 'uppercase') {
element.innerHTML = element.innerHTML.replace(/ı/g, 'I');
element.innerHTML = element.innerHTML.replace(/i/g, 'İ'); // replaces 'i' in tags too, regular expression should be extended if necessary
}
if (!element.childNodes || element.childNodes.length == 0) return;
for (var n in element.childNodes) {
replaceRecursive(element.childNodes[n]);
}
}
window.onload = function() { // as appropriate 'ondomready'
alert('before...');
replaceRecursive(document.getElementsByTagName('body')[0]);
alert('...after');
}
</script>
</body>
</html>

Here's my enhanced version of alex's code that I am using in production:
(function($) {
function getStyle(element, style) {
var result;
if (document.defaultView && document.defaultView.getComputedStyle) {
result = document.defaultView.getComputedStyle(element, '').getPropertyValue(style);
} else if(element.currentStyle) {
style = style.replace(/\-(\w)/g, function (strMatch, p1) {
return p1.toUpperCase();
});
result = element.currentStyle[style];
}
return result;
}
function replaceRecursive(element, lang) {
if(element.lang) {
lang = element.lang; // Maintain language context
}
if (element && element.style && getStyle(element, 'text-transform') == 'uppercase') {
if (lang == 'tr' && element.value) {
element.value = element.value.replace(/ı/g, 'I');
element.value = element.value.replace(/i/g, 'İ');
}
for (var i = 0; i < element.childNodes.length; ++i) {
if (lang == 'tr' && element.childNodes[i].nodeType == Node.TEXT_NODE) {
element.childNodes[i].textContent = element.childNodes[i].textContent.replace(/ı/g, 'I');
element.childNodes[i].textContent = element.childNodes[i].textContent.replace(/i/g, 'İ');
} else {
replaceRecursive(element.childNodes[i], lang);
}
}
} else {
if (!element.childNodes || element.childNodes.length == 0) return;
for (var i = 0; i < element.childNodes.length; ++i) {
replaceRecursive(element.childNodes[i], lang);
}
}
}
$(document).ready(function(){ replaceRecursive(document.getElementsByTagName('html')[0], ''); })
})(jQuery);
Note that I am using jQuery here only for the ready() function. The jQuery compatibility wrapper is also as a convenient way to namespace the functions. Other than that, the two functions do not rely on jQuery at all, so you could pull them out.
Compared to alex's original version this one solves a couple problems:
It keeps track of the lang attribute as it recurses through, since if you have mixed Turkish and other latin content you will get improper transforms on the non-Turkish without it. Pursuant to this I pass in the base html element, not the body. You can stick lang="en" on any tag that is not Turkish to prevent improper capitalization.
It applies the transformation only to TEXT_NODES because the previous innerHTML method did not work with mixed text/element nodes such as labels with text and checkboxes inside them.
While having some notable deficiencies compared to a server side solution, it also has some major advantages, the chief of which is guaranteed coverage without the server-side having to be aware of what styles are applied to what content. If any of the content is being indexed and shown in Google summaries (for example) it is much better if it stays lowercase when served.

The next version of Firefox Nightly (which should become Firefox 14) has a fix for this problem and should handle the case without any hack (as the CSS3 specs request it).
The gory details are available in that bug : https://bugzilla.mozilla.org/show_bug.cgi?id=231162
They also fixed the problem for font-variant I think (For those not knowing what font-variant does, see https://developer.mozilla.org/en/CSS/font-variant , not yet up-to-date with the change but the doc is browser-agnostic and a wiki, so...)

The root cause of this problem must be incorrect handling of these turkish characters by unicode library used in all these browsers. So I doubt there is an front-end-side fix for that.
Someone has to report this issue to the developers of these unicode libs, and it would be fixed in few weeks/months.

If you can't rely on text-transform and browsers you will have to render your text in uppercase yourself on the server (hope you're not uppercasing the text as the user types it).
You should have a better support for internationalisation there.

This work-around requires some Javascript. If you don't want to do that, but have something server side that can preprocess the text, this idea will work there too (I think).
First, detect if you are running in Turkish. If you are, then scan whatever you are going to uppercase to see if it contains the problem characters. If they do, replace all of those characters with the uppercase version of them. Then apply the uppercase CSS. Since the problem characters are already uppercase, that should be a totally fine (ghetto) work around. For Javascript, I envision having to deal with some .innerHTML on your impacted elements.
Let me know if you need any implementation details, I have a good idea of how to do this in Javascript using Javascript string manipulation methods. This general idea should get you most of the way there (and hopefully get me a bounty!)
-Brian J. Stinar-

Related

Show ordered list in Russian alphabets/numerics [duplicate]

There are many possible values for list-style-type CSS property (e. g. decimal, lower-latin, upper-greek and so on). However there are none for the Cyrillic alphabet (which, btw, has different variations for different languages).
What is the best way to style an ordered list with Cyrillic letters?
(I'm providing a solution I ended up with despite I'm not very happy with it.)
I know nothing about Cyrillic list schemes so I’m at risk of a bit of cultural embarrassment here, but CSS3 Lists module (still in working draft) defines quite a few Cyrillic alphabetic list types: lower-belorussian, lower-bulgarian, lower-macedonian, lower-russian, lower-russian-full, lower-serbo-croatian, lower-ukrainian, lower-ukrainian-full, upper-belorussian, upper-bulgarian, upper-macedonian, upper-russian, upper-russian-full, upper-serbo-croatian, upper-ukrainian, upper-ukrainian-full. As expected, the state of support for these is deplorable currently (certainly nothing in Gecko or WebKit), but hopefully going forwards these will start to be implemented.
Update: some changes have been made – the definition of list types has been moved into the CSS3 Counter Styles module whose current draft (Feb 2015) has unfortunately lost all alphabetical Cyrillic types. This is in Candidate Recommendation stage so it’s unlikely that additions will be made at the point. Perhaps in CSS4 List Styles?
In this method I'm using CSS-generated content in before each list item.
.lower-ukrainian {
list-style-type: none;
}
.lower-ukrainian li:before {
display: inline-block;
margin-left: -1.5em;
margin-right: .55em;
text-align: right;
width: .95em;
}
.lower-ukrainian li:first-child:before {
content: "а.";
}
.lower-ukrainian li:nth-child(2):before {
content: "б.";
}
/* and so on */
Disadvantages
Hardcoded, restrict list to a certain max length.
Not pixel-perfect as compared to a regular order list
Here is another solution for Cyrillic letters with pretty clear code: jsfiddle.net
(() => {
const selector = 'ol.cyrillic',
style = document.createElement('style');
document.head.appendChild( style );
'абвгдежзиклмнопрстуфхцчшщэюя'.split('').forEach((c, i) =>
style.sheet.insertRule(
`${selector} > li:nth-child(${i+1})::before {
content: "${c})"
}`, 0)
);
})();
PS. You can convert this next-gen code to old one with Babel: babeljs.io
I'm surprised that there is no Cyrillic numbering. Here's a quick JS solution for you:
function base_convert(n, base) {
var dictionary = '0123456789abcdefghijklmnopqrstuvwxyz';
var m = n.toString(base);
var digits = [];
for (var i = 0; i < m.length; i++) {
digits.push(dictionary.indexOf(m.charAt(i)) - 1);
}
return digits;
}
var letters = {
'russian': {
'lower': 'абвгдеёжзийклмнопрстуфхцчшщъыьэюя',
'upper': 'АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ'
}
}
$('ul, ol').each(function() {
if (!(results = $(this).prop('class').match(/(upper|lower)-([a-z]+)/i))) return;
var characters = letters[results[2]][results[1]];
$('> li', this).each(function(index, element) {
var number = '', converted = base_convert(++index, characters.length);
for (var i = 0; i < converted.length; i++) {
number += characters.charAt(converted[i]);
}
$(this).attr('data-letter', number);
});
});​
My written Russian is admittedly bad, as you can see by my inability to count with letters, so change the letters object appropriately.
Demo: http://jsfiddle.net/JFFqn/14/

IE8 select population performance issues - solutions needed

I have an application that is having issue when populating selects with over 100 items. This problem only occurs in IE8. I am using angularjs to do the population, but my research shows that this is a general problem with IE8. What solutions have others used to deal with this problem. We have over 40,000 users tied to IE8 for the foreseeable future (Fortune 200 company) so moving to another browser is not an option.
Some thoughts I had.
Create a series of option tags as a one long string in memory and replace the innerHTML of the . But running some people samples this does not appear to solve the issue.
Originally populating the select with a few and then adding the rest as the user scrolls down. I am not sure if this is possible, or how to implement this
I am sure others have run into this issue. Does anyone have some ideas?
Thanks,
Jerry
Another solution that preserves the original <select> is to set the <option> values after adding the options to the <select>.
Theory
Add the <option> elements to a document fragment.
Add the document fragment to the <select>.
Set the value for each <option>.
Practice
In practice we end up with a couple issues we have to work around to get this to work:
IE11 is very slow when setting the value for each individual <option>.
IE8 has selection bugs because it isn't properly doing a re-flow/layout on the <select>.
Result
To handle these what we really do is something like the following:
Add the <option> tags to a document fragment. Make sure to set the values so that step 3 is a no-op in IE11.
Add the document fragment to the <select>.
Set the value for each <option>. In IE8 this will set the values, in IE11 this is a no-op.
In a setTimeout add and remove a dummy <option>. This forces a re-flow.
Code
function setSelectOptions(select, options)
{
select.innerHTML = ''; // Blank the list.
// 1. Add the options to a document fragment.
var docFrag = document.createDocumentFragement();
for (var i = 0; i < options.length; i++)
{
var opt = document.createElement('option');
opt.text = options[i];
docFrag.appendChild(opt);
}
// 2. Add the document fragment to the select.
select.appendChild(docFrag);
// 3. Set the option values for IE8. This is a no-op in IE11.
for (i = 0; i < options.length; i++)
select.options[i].text = options[i];
// 4. Force a re-flow/layout to fix IE8 selection bugs.
window.setTimeout(function()
{
select.add(document.createElement('option'));
select.remove(select.options.length - 1);
}, 0);
}
The best solution seems to be to create the Select and it's options as a text string and add that string as the innerHTML of the containing tag such as a DIV. Below is some code.
<div id="selectHome" ></div>
In JS (from angular controller)
function insertSelect(divForSelect) {
var str = "<select id='myselect'>";
for (var i = 0; i < data.length; i++) {
str += '<option>' + data[i] + '</data>';
}
str += '</select>';
divForSelect.innnerHTML = str;
}
Note that inserting options into a existing Select is very slow (8,000 msecs for 2000 items). But, if the select and the options are inserted as a single string it is very fast (12 msec for 2000 items).

Google Apps Script: weird page layout in a script formatted document

I'm working on a script that applies custom headings to a plain text document imported in Google Docs. The scripts works pretty much as it should. However the resulting document has a weird layout, as if random page breaks were inserted here and there. But there are no page breaks and I can't understand the reason of this layout. Checking the paragraph attributes give me no hints on what is wrong.
Here is the text BEFORE the script is applied:
https://docs.google.com/document/d/1MzFvlkG13i3rrUcz5jmmSppG4sBH6zTXr7RViwdqaIo/edit?usp=sharing
You can make a copy of the document and execute the script (from the Scripts menu, choose Apply Headings). The script applies the appropriate heading to the scene heading, name of the character, dialogue, etc.
As you can see, at the bottom of page 2 and 3 of the resulting document there is a big gap and I can't figure out why. The paragraph attributes seem ok to me...
Here is a copy of the script:
// Apply headings to sceneheadings, actions, characters, dialogues, parentheticals
// to an imported plain text film script;
function ApplyHeadings() {
var pars = DocumentApp.getActiveDocument().getBody().getParagraphs();
for(var i=0; i<pars.length; i++) {
var par = pars[i];
var partext = par.getText();
var indt = par.getIndentStart();
Logger.log(indt);
if (indt > 100 && indt < 120) {
var INT = par.findText("INT.");
var EXT = par.findText("EXT.");
if (INT != null || EXT != null) {
par.setHeading(DocumentApp.ParagraphHeading.HEADING1);
par.setAttributes(ResetAttributes());
}
else {
par.setHeading(DocumentApp.ParagraphHeading.NORMAL);
par.setAttributes(ResetAttributes());
}
}
else if (indt > 245 && indt < 260) {
par.setHeading(DocumentApp.ParagraphHeading.HEADING2);
par.setAttributes(ResetAttributes());
}
else if (indt > 170 && indt < 190) {
par.setHeading(DocumentApp.ParagraphHeading.HEADING3);
par.setAttributes(ResetAttributes());
}
else if (indt > 200 && indt < 240) {
par.setHeading(DocumentApp.ParagraphHeading.HEADING4);
par.setAttributes(ResetAttributes());
}
}
}
// Reset all the attributes to "null" apart from HEADING;
function ResetAttributes() {
var style = {};
style[DocumentApp.Attribute.STRIKETHROUGH] = null;
style[DocumentApp.Attribute.HORIZONTAL_ALIGNMENT] = null;
style[DocumentApp.Attribute.INDENT_START] = null;
style[DocumentApp.Attribute.INDENT_END] = null;
style[DocumentApp.Attribute.INDENT_FIRST_LINE] = null;
style[DocumentApp.Attribute.LINE_SPACING] = null;
style[DocumentApp.Attribute.ITALIC] = null;
style[DocumentApp.Attribute.FONT_SIZE] = null;
style[DocumentApp.Attribute.FONT_FAMILY] = null;
style[DocumentApp.Attribute.BOLD] = null;
style[DocumentApp.Attribute.SPACING_BEFORE] = null;
style[DocumentApp.Attribute.SPACING_AFTER] = null;
return style;
}
A couple of screenshots to make the problem more clear.
This is page 2 of the document BEFORE the script is applied.
This is page two AFTER the script is applied. Headings are applied correctly but... Why the white space at the bottom?
Note: if you manually re-apply HEADING2 to the first paragraph of page 3 (AUDIO TV), the paragraph will jump back to fill the space at the bottom of page 2. This action, however, doesn't change any attribute in the paragraph. So why the magic happens?
Thanks a lot for your patience.
That was an interesting problem ;-)
I copied your doc, ran the script and had a surprise : nothing happened !
It took me a few minutes to realize that the copy I just made had no style defined for headings, everything was for some reason in courrier new 12pt, including the headings.
I examined the log and saw the indent values, played with that a lot to finally see that the headings were there but not changing the style.
So I went in the doc menu and set 'Use my default style and... everything looks fine, see screen capture below.
So now your question : it appears that there must be something wrong in your style definition, by "wrong" I mean something that changes more than just the font Style and size but honestly I can't see any way to guess what since I'm unable to reproduce it... Please try resetting your heading styles and re-define your default.... and tell us what happens then.
PS : here are my default heading styles : (and the url of my copy in view only :https://docs.google.com/document/d/1yP0RRCrRSsQc9zCk-sdfu5olNGDkoIrabXanII4qUG0/edit?usp=sharing )

highlight words in html using regex & javascript - almost there

I am writing a jquery plugin that will do a browser-style find-on-page search. I need to improve the search, but don't want to get into parsing the html quite yet.
At the moment my approach is to take an entire DOM element and all nested elements and simply run a regex find/replace for a given term. In the replace I will simply wrap a span around the matched term and use that span as my anchor to do highlighting, scrolling, etc. It is vital that no characters inside any html tags are matched.
This is as close as I have gotten:
(?<=^|>)([^><].*?)(?=<|$)
It does a very good job of capturing all characters that are not in an html tag, but I'm having trouble figuring out how to insert my search term.
Input: Any html element (this could be quite large, eg <body>)
Search Term: 1 or more characters
Replace Txt: <span class='highlight'>$1</span>
UPDATE
The following regex does what I want when I'm testing with http://gskinner.com/RegExr/...
Regex: (?<=^|>)(.*?)(SEARCH_STRING)(?=.*?<|$)
Replacement: $1<span class='highlight'>$2</span>
However I am having some trouble using it in my javascript. With the following code chrome is giving me the error "Invalid regular expression: /(?<=^|>)(.?)(Mary)(?=.?<|$)/: Invalid group".
var origText = $('#'+opt.targetElements).data('origText');
var regx = new RegExp("(?<=^|>)(.*?)(" + $this.val() + ")(?=.*?<|$)", 'gi');
$('#'+opt.targetElements).each(function() {
var text = origText.replace(regx, '$1<span class="' + opt.resultClass + '">$2</span>');
$(this).html(text);
});
It's breaking on the group (?<=^|>) - is this something clumsy or a difference in the Regex engines?
UPDATE
The reason this regex is breaking on that group is because Javascript does not support regex lookbehinds. For reference & possible solutions: http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript.
Just use jQuerys built-in text() method. It will return all the characters in a selected DOM element.
For the DOM approach (docs for the Node interface): Run over all child nodes of an element. If the child is an element node, run recursively. If it's a text node, search in the text (node.data) and if you want to highlight/change something, shorten the text of the node until the found position, and insert a highligth-span with the matched text and another text node for the rest of the text.
Example code (adjusted, origin is here):
(function iterate_node(node) {
if (node.nodeType === 3) { // Node.TEXT_NODE
var text = node.data,
pos = text.search(/any regular expression/g), //indexOf also applicable
length = 5; // or whatever you found
if (pos > -1) {
node.data = text.substr(0, pos); // split into a part before...
var rest = document.createTextNode(text.substr(pos+length)); // a part after
var highlight = document.createElement("span"); // and a part between
highlight.className = "highlight";
highlight.appendChild(document.createTextNode(text.substr(pos, length)));
node.parentNode.insertBefore(rest, node.nextSibling); // insert after
node.parentNode.insertBefore(highlight, node.nextSibling);
iterate_node(rest); // maybe there are more matches
}
} else if (node.nodeType === 1) { // Node.ELEMENT_NODE
for (var i = 0; i < node.childNodes.length; i++) {
iterate_node(node.childNodes[i]); // run recursive on DOM
}
}
})(content); // any dom node
There's also highlight.js, which might be exactly what you want.

How to find specific value in a large object in node.js?

Actually I've parsed a website using htmlparser and I would like to find a specific value inside the parsed object, for example, a string "$199", and keep tracking that element(by periodic parsing) to see the value is still "$199" or has changed.
And after some painful stupid searching using my eyes, I found the that string is located at somewhere like this:
price = handler.dom[3].children[3].children[3].children[5].children[1].
children[3].children[3].children[5].children[0].children[0].raw;
So I'd like to know whether there are methods which are less painful? Thanks!
A tree based recursive search would probably be easiest to get the node you're interested in.
I've not used htmlparser and the documentation seems a little thin, so this is just an example to get you started and is not tested:
function getElement(el,val) {
if (el.children && el.children.length > 0) {
for (var i = 0, l = el.children.length; i<l; i++) {
var r = getElement(el.children[i],val);
if (r) return r;
}
} else {
if (el.raw == val) {
return el;
}
}
return null;
}
Call getElement(handler.dom[3],'$199') and it'll go through all the children recursively until it finds an element without an children and then compares it's raw value with '$199'. Note this is a straight comparison, you might want to swap this for a regexp or similar?