What's the rationale for not allowing multiline placeholders in HTML5? - html

I'm creating a very simple form that has a text area. The text area takes in a formatted block of names separated by newlines. To make the application slightly more useable, it would be nice if I could include a placeholder example that had multiple lines of text. Unfortunately, that doesn't seem to be possible with the HTML5 specification. Does anybody know why?

<placeholder> is like <blockquote> to me. It has a specific niche.
In the case of the <placeholder> attribute, it's mainly used in one-line form fields; not text areas.
How often do you use a carriage return in a one-line form field? Never.
The <placeholder> attribute represents a short hint (a word or short phrase) intended to aid the user with data entry. A hint could be a sample value or a brief description of the expected format. The attribute, if specified, must have a value that contains no U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR) characters.
Since HTML5 is still fresh, new, and continues to be optimized and tweaked in various browsers; who knows what crazy things would happen cross browser-wise if the <placeholder> attribute didn't have such strict guidelines set up?
The web seems to be moving in the direction to help designers/developers type less code, and make less mistakes.
I've seen a few posts (by people like Paul Irish and Jeffrey Way) talking about omitting things like closing tags, and many standard elements have been modified in HTML5 to be shorter/easier (e.g.<!doctype html>). Also, what used to be traditional attributes required to make a webpage function well can now be easily thrown out all together. The web is getting simpler, and more complex at the same time.
All in all though, if you're wanting something to fix the dilemma (that you are seemingly suffering from by the tone of your question), then just use the <title> attribute instead. Refer to the selected answer in the question located at the following link:
Can you have multiline HTML5 placeholder text in a <textarea>?

Related

Store arbitrary characters in Semantic MediaWiki

I'm trying to store some text containing html tags into properties, which doesn't work. I created a form for a property with the data type 'text' and a template. Saving the form writes the text into the template, but it can't get displayed, as it contains illegal characters, as I guess.
What I'm trying to do:
I need a form to enter data, containing html tags and special
characters
I'd like to be able to use a query to find all those pages
and show that text using a template I provide to the ask query.
I also tried to use the free text option, but then I can't retrieve it using the ask query.
What would be the best, or at least a working solution to this?
Thanks a lot
storing text with html tags is a bit tricky in SemanticMediaWiki
The reason is the invention of the StripMarkers UNIQ/QINU by the MediaWiki developers.
When parsing the content of page with html tags in it the parsing is sort of "postponed". This technical detail unfortunately makes it hard for extension developers like the SMW developers to solve the issue of handling such content. Also it makes it hard for lay people to follow the discussion on how to solve the problem
Here are two examples of SMW Issues that are marked as "closed". This state of affairs means that by following the configuration hints in the issue your problem should be solved. If not please ask a question on the SMW issue list or even initiate the reopening of the issues.
https://github.com/SemanticMediaWiki/SemanticMediaWiki/pull/794
https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/3707
On my wiki we ran into this and resolved it by replacing special characters (we had issues with [ ] =, but the same problem happens with to < > tags too) with alternate unicode characters using the regex extension and a template before setting the property with {{#set:}}. If you want to display the formatted text on the wiki directly then call that parameter separately without replacing the unicode characters.
When you want to display the property, you can then run the reverse replacement with regex before displaying your now intact code (using the template result format to allow you to perform the operation on the output of the query).
To switch to special characters you can create this template
{{#regex:{{#regex:{{#regex:{{#regex:{{#regex:{{{1|}}}|/=/|꞊}}|/\[/|[}}|/\]/|]}}|/>/|≽}}|/</|≼}}
And to switch back you can use this as a template
{{#regex:{{#regex:{{#regex:{{#regex:{{#regex:{{{1|}}}|/꞊/|=}}|/[/|[}}|/]/|]}}|/≽/|>}}|/≼/|<}}

Markup-less way to title and link abbrevations/acronyms to glossary entries

Background: I'm writing a DocBook 5 document (and including in it some already-written text) with the intention of generating HTML from it. I would like to get the semantic markup correct from the beginning so I don't need to re-do it later, but the standard way does not seem to generate what I'm looking for, so I'm not sure if I should deviate from it or not, depending on what is possible with XSL.
Current setup: My glossary only has abbreviated items. It consists of glossentrys each containing a full-spelling glossterm and some non-zero amount of acronyms and/or abbrevs. I suppose it could all just be glossterms instead. It doesn't matter to me. Suppose for example I have this:
<glossentry xml:id="ff"><glossterm>Firefox</glossterm>
<acronym>FF</acronym>
<glossdef>
<para>The web browser made by Mozilla.</para>
</glossdef>
</glossentry>
Ideal
Suppose wherever I want to refer to Firefox, I put FF in the text. Ideally, without any additional markup, wherever "FF" (case sensitive) appeared as a plain whole word in a paragraph (or title, but not, for example, code or programlisting or inside a URL inside an attribute...) in my DocBook file, it would come out in HTML as the text "FF" but marked up as a link to the glossary entry, but not with the standard link CSS, and furthermore with a title attribute having the value Firefox. That way a reader can hover to get the acronym/abbreviation spelled out for them, and if that is insufficient, they can click to be taken to a fuller definition. Meanwhile I would style it black and underlined, so that they know this feature is there, but it doesn't distract one's attention like a normal link does, especially with how often it occurs in the text.
Main question: is such replacement of plaintext, markup-less terms even possible in XSL (without creating something like the Scunthorpe problem)? If so, can it do this for every acronym or abbrev found in the glossary, automatically?
I could not figure out how to do this directly, but that is still my goal. Meanwhile I've tried other things:
Approach 1
Set up a keyboard macro so I can type ff and have that be transformed while I'm typing into <xref linkend="ff"/>.
Pros:
links to the glossary
spells out the abbreviation
Cons:
spells out the abbrevation (it would be nice to keep it short to read, not just short to type)...workaround: make the acronym into a glossterm and put it first in the glossentry (loss of semantics, but maybe that's OK here?)
links to the glossary (I would like it styled differently)...solution: CSS for a.xref
even with the above two worked around, the title comes out as FFFirefox instead of just Firefox (and with others that have more than one synonym, the mashing-together continues)...solution: put an alternate xml:id on your preferred acronym/abbrev, and then make the links in it refer to that id in their endterm attribute (as well as the linkend referring to the first id)
I have to remember to use the keyboard macro rather than just typing and letting the system do the work; any imported text then has to have text replacements done on it for each glossary entry
Approach 2
Using <xsl:param name="glossterm.auto.link" select="1"/> and a keyboard macro to change FF into <glossterm>FF</glossterm>.
Pros:
links to the glossary
Cons:
spells out the abbrevation (it would be nice to keep it short to read, not just short to type)...workaround: make the acronym into a glossterm and put it first in the glossentry (loss of semantics, but maybe that's OK here?)
links to the glossary (I would like it styled differently)...solution: CSS for .glossterm
even with the above two worked around, no title attribute is given
I have to remember to use the keyboard macro rather than just typing and letting the system do the work; any imported text then has to have text replacements done on it for each glossary entry
Approach 1 so far seems better after the workarounds, but is there a way to achieve the ideal I outlined?

Why do I need Markdown?

Why do I need a Markdown with a front edit editor like WMD? What does the markdown do to the content that’s sent from the WMD editor?
How does Markdown store the content in the backend? Is it the same way like *bold* or in some other format? Why can’t I just do an html encode?
Sorry if I sounded very naïve.

			
				
It's probably helpful to take a step back and ask some of the larger questions. The issue Markdown is trying to solve is that of rich editing in the browser. Consider this: At some point, for any piece of software to enable rich text it has to describe the richness in a some manner, however that may be.
We could call that description of richness (by description of richness I mean like "this bit of text is bold" or "this bit of text is a hyperlink), we could call that description of richness "markup" -- it marks up the text with meta "richness".
Implementations of rich text can take on two approaches, either a.) hide the markup from the user or b.) let them have access to the markup.
For those who choose to hide it, the end result is very often WYSIWYG. The user is oblivious to what is happening behind the scenes. The editor takes care of the details. Think MS Word as an example. No one manipulates the Word markup format as a regular end user.
For implementations which choose to expose the markup, a markup language is then in order to allow users to interacat with it. Such markup languages would be things like HTML doing <tag> or BB code for example, doing things like [tag].
Markdown is one such of these languages.
As opposed to the former types I mentioned, Markdown has tried to design itself so that the markup renders common ASCII people already use. For example, it's common for people to asterisk their text to set it off, *important*, and this notation in Markdown is an indicator of italic.
In regards to storage, as Stephan pointed out, the system will most likely store the raw markdown, because the user will most likely need to have the possibility of editing, and the original markdown can be recalled for that purpose.
In most of the systems I've built, I store the markdown, and then normalize it to a 2nd field which caches the HTML rendering of the markdown. This way I don't have to do markdown->HTML rendering for every markdown field. It takes a little more space, but I'd rather the user have a faster response than use less DB storage space.
Care should also be taken when accepting Markdown from the browser, as it can easily contain <script> tags which need to be filtered out. Most markdown implementations will also recognize HTML intermingled with Markdown formatting, as so to be safe, you need to make sure your inputs and caches are sanitized properly.
The reason for using an alternate encoding system other than HTML is for security
Markdown and other such wiki style encoding systems do not usually support scripting languages
HTML supports scripting languages in many ways (
The two main security issues are:
Malware criminals use scripts in user generated content to attempt malware actions on the content readers computer by scripting to access known security holes
Free loaders using scripts to subvert the rest of the site by changing the content frame or styles i.e. ads, menu's, logos etc. This can also be criminal behaviour if not just annoying
By using an intermediate language such as Markdown you have total control on the rendered output
Filtering HTML is possible, but is also complex and risky
The other significant reason for an alternate encoding system is enforcement of style. Normal HTML has too many options. By limiting the available options, users can only use certain styles. The usually makes for cleaner looking and more readable content (compare SO to Ebay)
The main reason for using Markdown is the readability of a marked text. For instance, you can send it in a plain-text email and the reader will still understand the emphiasis, bullets, the text will be divided in paragraphs et cetera.
When you ask about storing data, it depends. If you enable Markdown in the WordPress blog engine, it stores data as the user has input it - in Markdown. In Stack Overflow, however, it seems like the data is stored as HTML. At least, the "Stack Overflow data dumps" contain HTML, not Markdown (I've seen people complaining) that they have to convert it back).
If you use the WMD editor, you can show the user how the outputs will look like after being converted to HTML. Even though Markdown syntax is really simple, it is not hard to make mistakes. Hence, it is best to show users the output.
Another reason for using Markdown instead of a WYSIWIG control - a WYSIWIG control allows the user to use HTML in data you are displaying on your web page. So, you have to be the one who decides when there is simply incorrect HTML and when it is an evil XSS/CSRF/whatever injection. In Markdown, you simply convert *something* to <b>something</b>, remove any unknow HTML elements and you're done.

Find and Replace and a WYSIWYG Editor

My problem is as follows:
I have a column: ProductName. Now, the text entered here is entered from tinyMCE so it has all kinds of tags. The user wants to be able to do a Find-And-Replace on all products, and it has to support coloring.
For example - let's say this is a portion of a ProductName:
other text.. <strong>text text <font color="#ff6600">colortext®</font></strong> ..other text
Now, the user wants to replace the :
<font color="#ff6600">colortext®</font>
The original name has the <strong> tag in it so it appears bold. So the users makes it bold - now the text he is searching for is:
<strong><font color="#ff6600">colortext®</font></strong>
Obviously I'm not going to find it. Plus there's the matter of spaces: in one place it has a space in another it doesn't.
Is there a way to overcome this?
Strip the HTML tags from the search text and do a plain text search first. Then, part by part (i.e., text node by text node), take the element path of the search text's parts, and compare these with their counterparts in the found text. If the paths for all parts match, you're done.
Edit: By path, I meant something similar to XPath, or the path notion of the TinyMCE editor. Example: plain text part of the search text is "colortext®". The path of this text node in the search text is <strong>/<font color="#ff6600">. Search for the same plain text in the text body (trivial), and take it's path, which is also <strong>/<font color="#ff6600">. (Compare this with the path of "other text..", which is /, and of "text text", which is <strong>.) The two paths are the same, so this is a real match. If you have a DOM tree representation, determining the path shouldn't be difficult.
You're asking for several related, but discrete, abilities:
Search and Replace content
Search and Replace formatting
Search and Replace similar (ie, ignore trivial differences in whitespace)
You should take this in steps - otherwise it becomes overwhelming and a single search algorithm won't be able to do all three without intense effort and resulting in difficult to maintain code.
First, look at the similar problem. Make a search that ignores spaces and case. You might want to get into Lucene or another search engine technology if you also need to deal with "bowl" vs "bowls" and "intelligent" vs "smart" - though I expect this is beyond your current needs.
Once you have that working, it becomes one layer in your stack of searches.
Second, look a formatting search. This is typically done using tokens or tags - which you already have in the form of HTML. However, you have to be able to deal with things out of sequence - so <b><i>text</i></b> needs to be caught in a search for <i><b>text</b></i> and the malformed representation where tags aren't nested properly, such as <b><i>text</b></i>.
One method of this is to pre-parse the string and apply the formatting styles to each character. So you'd have a t that's bold and italic, an e that's bold and italic, etc. to make this easier and faster use a hash to represent the style combination - Read the first character, figure out what style it is (keep track of this turning styles on and off and you find tags) and if it already exists in the hash, assign that hash number to the letter. If it doesn't, get the new hash number and assign that.
Now you can compare the letter and its style hash against your search and get format and content matches. Stack that on top of your similar match and you have what you need.
-Adam
If it's valid XML, an XSLT would be trivial for this kind of exercise.
Use an identity template, and then add an XPath to find the specific node you want:
<xsl:template match="//strong/font">
<xsl:copy>
<!-- Insert the replacement text here -->
</xsl:copy>
</xsl:template>
When working with XML, this would be a maintainable, extensible solution.
Not sure to understand everything you said but the use of regular expression seems a good way to overcome the problem you're talking about.

How do you handle translation of text with markup?

I'm developing multi-language support for our web app. We're using Django's helpers around the gettext library. Everything has been surprisingly easy, except for the question of how to handle sentences that include significant HTML markup. Here's a simple example:
Please log in to continue.
Here are the approaches I can think of:
Change the link to include the whole sentence. Regardless of whether the change is a good idea in this case, the problem with this solution is that UI becomes dependent on the needs of i18n when the two are ideally independent.
Mark the whole string above for translation (formatting included). The translation strings would then also include the HTML directly. The problem with this is that changing the HTML formatting requires changing all the translation.
Tightly couple multiple translations, then use string interpolation to combine them. For the example, the phrase "Please %s to continue" and "log in" could be marked separately for translation, then combined. The "log in" is localized, then wrapped in the HREF, then inserted into the translated phrase, which keeps the %s in translation to mark where the link should go. This approach complicates the code and breaks the independence of translation strings.
Are there any other options? How have others solved this problem?
Solution 2 is what you want. Send them the whole sentence, with the HTML markup embedded.
Reasons:
The predominant translation tool, Trados, can preserve the markup from inadvertent corruption by a translator.
Trados can also auto-translate text that it has seen before, even if the content of the tags have changed (but the number of tags and their position in the sentence are the same). At the very least, the translator will give you a good discount.
Styling is locale-specific. In some cases, bold will be inappropriate in Chinese or Japanese, and italics are less commonly used in East Asian languages, for example. The translator should have the freedom to either keep or remove the styles.
Word order is language-specific. If you were to segment the above sentence into fragments, it might work for English and French, but in Chinese or Japanese the word order would not be correct when you concatenate. For this reason, it is best i18n practice to externalize entire sentences, not sentence fragments.
2, with a potential twist.
You certainly could localize the whole string, like:
loginLink=Please log in to continue
However, depending on your tooling and your localization group, they might prefer for you to do something like:
// tokens in this string add html links
loginLink=Please {0}log in{1} to continue
That would be my preferred method. You could use a different substitution pattern if you have localization tooling that ignores certain characters. E.g.
loginLink=Please %startlink%log in%endlink% to continue
Then perform the substitution in your jsp, servlet, or equivalent for whatever language you're using ...
Disclaimer: I am not experienced in internationalization of software myself.
I don't think this would be good in any case - just introduces too much coupling …
As long as you keep formatting sparse in the parts which need to be translated, this could be okay. Giving translators the possibility to give special words importance (by either making them a link or probably using <strong /> emphasis sounds like a good idea. However, those translations with (X)HTML possibly cannot be used anywhere else easily.
This sounds like unnecessary work to me …
If it were me, I think I would go with the second approach, but I would put the URI into a formatting parameter, so that this can be changed without having to change all those translations.
Please log in to continue.
You should keep in mind that you may need to teach your translators a basic knowledge of (X)HTML if you go with this approach, so that they do not screw up your markup and so that they know what to expect from that text they write. Anyhow, this additional knowledge might lead to a better semantic markup, because, as mentioned above, texts could be translated and annotated with (X)HTML to reflect local writing style.
What ever you do keep the whole sentence as one string. You need to understand the whole sentece to translate it correctly.
Not all words should be translated in all languages: e.g. in Norwegian one doesn't use "please" (we can say "vær så snill" literally "be so kind" but when used as a command it sounds too forceful) so the correct norwegian vould be:
"Logg inn for å fortsette" lit.: "Log in to continue" or
"Fortsett ved å logge inn" lit.: "Continue by to log in" etc.
You must allow completely changing the order, e.g. in a fictional demo language:
"Für kontinuer Loggen bitte ins" (if it was real) lit.: "To continue log please in"
Some language may even have one single word for (most of) this sentence too...
I'll recommend solution 1 or possibly "Please %{startlink}log in%{endlink} to continue" this way the translator can make the whole sentence a link if that's more natural, and it can be completely restructured.
Interesting question, I'll be having this problem very soon. I think I'll go for 2, without any kind of tricky stuff. HTML markup is simple, urls won't move anytime soon, and if anything is changed a new entry will be created in django.po, so we get a chance to review the translation ( ex: a script should check for empty translations after makemessages ).
So, in template :
{% load i18n %}
{% trans 'hello world' %}
... then, after python manage.py makemessages I get in my django.po
#: templates/out.html:3
msgid "hello world"
msgstr ""
I change it to my needs
#: templates/out.html:3
msgid "hello world"
msgstr "bonjour monde"
... and in the simple yet frequent cases I'll encounter, it won't be worth any further trouble. The other solutions here seems quite smart but I don't think the solution to markup problems is more markup. Plus, I want to avoid too much confusing stuff inside templates.
Your templates should be quite stable after a while, I guess, but I don't know what other trouble you expect. If the content changes over and over, perhaps that content's place is not inside the template but inside a model.
Edit: I just checked it out in the documentation, if you ever need variables inside a translation, there is blocktrans.
Makes no sense, how would you translate "log in"?
I don't think many translators have experience with HTML (the regular non-HTML-aware translators would be cheaper)
I would go with option 3, or use "Please %slog in%s to continue" and replace the %s with parts of the link.