Some sites use `` for code formatting, while some others that I use have [c][/c] for code, or [b][/b] for bold etc. Then, some other sites like YouTube use things like ** for bold and __ for italics (I think that WhatsApp uses this same convention).
What are these called? Is `` a form of Markdown and [c][/c] is a form of HTML? What is the type used by YouTube and WhatsApp called?
And what, in general, do we call this class of formatting?
The general term for all of these types of languages would be Lightweight Markup Languages, although that term is not actually used much. Generally, "Markup Language" will suffice.
The [b][/b] syntax is most likely BBCode (although there is no [c] that I an aware of; I don't know what that would be).
Many markup languages (including Markdown) make use of backticks (`), underscores (_) and asterisks (*), and that use does not always mean the same thing. For an example of various languages, see this table on Wikipedia.
Note that some of those languages predate Markdown and may have even contributed to Markdown adopting the same behavior. Many others are newer than Markdown and could credit Markdown as their inspiration. That said, many of these markup languages which are "inspired" by Markdown are subtly different in various ways and are not actually Markdown. Slack and WhatsApp are two recent examples. Note that despite their similarity to Markdown, Wikipedia lists each of them as their own separate markup language due to those differences.
Finally, most of these lightweight markup languages are subsets of HTML. That is, they represent the small portion of HTML which is more commonly used in prose and are generally converted to HTML before being displayed. For example, StackOverflow converts Markdown questions, answers and comments to HTML and serves that HTML to your browser for display. HTML (HyperText Markup Language) itself uses angle brackets (<em>italics</em>), but so does XML (Extensible Markup Language) and various other languages. These would not be considered "lightweight."
Related
I'm trying to understand Markdown's relationship to HTML. If I understand correctly both are markup languages (an umbrella term describing languages that add formatting elements to plain-text documents). Markdown converts plain text to HTML.
My understanding is that Markdown is a superset of HTML:
Markdown is a popular markup language that is a superset of HTML.
I'm assuming that it's a strict or proper superset. Drawing a parallel from What does it mean when one language is a parallel superset of another?, I interpret that to mean that every valid HTML program is also a valid Markdown program (e.g. HTML is understood in a Jupyter Notebook Markdown cell), but that the converse is not true.
What seems conflicting to me is that if Markdown is a superset of HTML, then why is it that Markdown can't do everything HTML can (I would think the opposite to be true since a superset extends the language without removing or changing any of the existing features. Also, I would expect HTML to be a superset of Markdown since HTML is more expressive and more difficult to read by most humans.
Below is a diagram trying to mimic that in What does “Objective-C is a superset of C more strictly than C++” mean exactly?
That documentation is misleading. Markdown itself is not a superset of HTML. The documentation for the original Markdown project is pretty clear:
Markdown is not a replacement for HTML, or even close to it. Its syntax is very small, corresponding only to a very small subset of HTML tags. The idea is not to create a syntax that makes it easier to insert HTML tags. In my opinion, HTML tags are already easy to insert. The idea for Markdown is to make it easy to read, write, and edit prose. HTML is a publishing format; Markdown is a writing format. Thus, Markdown’s formatting syntax only addresses issues that can be conveyed in plain text.
Today there are several flavours of Markdown, many of which add features that were not present in the original version like tables and syntax-highlighted code blocks. This doesn't change the fundamental fact that Markdown covers a subset of HTML.
(Technically speaking, Markdown isn't a subset of HTML either. *, for example, has no special meaning in HTML. Unconverted Markdown documents might be well-formed HTML but the semantics are very different. But Markdown syntax maps to a subset of HTML tags.)
However, the very next paragraph in the original documentation says:
For any markup that is not covered by Markdown’s syntax, you simply use HTML itself. There’s no need to preface it or delimit it to indicate that you’re switching from Markdown to HTML; you just use the tags.
Since you can directly use HTML in Markdown it could be considered a superset of HTML. For example, this is valid Markdown:
# My awesome title
I <em>really</em> like coffee
If you pass an HTML document through a conforming Markdown processor it should come out the other side untouched. Being able to directly use HTML in Markdown is very similar to how one can directly use C in C++. This may be what the Jupyter documentation means.
I distinctly recognise the problem described in Jeff Eaton's article The Battle for the Body Field on A List Apart. That of handing clients a CMS that strikes a balance between conceptual simplicity in editing content, and flexibility in the flow and structure of that content. Whilst generating clean, forward-compatible code and responsive layouts.
I'm now convinced that some sort of custom tags are the solution to this problem. Even if they are wrapped in a WYSIWYG editor.
I'd like to keep preprocessing server-side until web-components are more widely, natively supported. And I favour Ruby/Rails for development.
So what libraries are available that would help with preprocessing and expanding custom XML or HTML tags in this way?
XSLT seems too limited. And Radius is perhaps a contender, though it doesn't appear to still be in active development.
I tend to favour markdown because it's extensible and acts as a subgroup of HTML. In Ruby, the main contenders are Redcarpet and kramdown. There are others, but I have not used them.
Redcarpet is mature and solid. It is also highly performant and extensible. You can define your own custom tags and syntax. It allows you to pre-process and post-process content.
It has disadvantages, though. Since it adheres to the markdown standard it can be limiting. I wrote my own figure tag syntax, and found that it was being inserted between paragraph tags, leading to invalid HTML. This is not its own fault. It's how markdown works.
!![figure caption](image_url "img alt text")
An alternative is kramdown, which is written with flexibility in mind. It allows full customisation of your syntax.
A webmaster-tools utility complains that my page is missing meta language information.
What should I do if my page contains mixture of text in multiple languages (all Western European languages)?
Most “webmaster-tools” complaints are best ignored (or not read at all). To get help with specific messages, please identify the tool, the message, and your URL.
A mixture of languages is a problem in itself, and best avoided by putting different language versions into different pages. If you need a mix of languages, use the lang attribute to specify the language of each part.
There is very little useful that you can do with meta tags, regarding a mixture of languages.
Have you looked at Multi-regional and multilingual sites article? It provides some good suggestions.
What HTML markup and tags should I use if write in article.
This `foreign word` translated from foreign language as `this word in native reader language word`.
Use the most appropriate markup (using a generic element if nothing better presents itself) with a lang attribute.
<body lang="en">
<!-- etc -->
<p><span lang="de">unbekanntes Flugobjekt</span> is German for UFO.</p>
This won't generally provide automatic translation, but the option exists for browsers / browser extensions to provide such a mechanism. Translation tools such as Google Translate may use it as a hint to identify the "from" language. Text to speech software may use it to select a pronunciation guide. And so on.
There is no HTML markup specifically for such purposes. It really depends on the conventions of the human language used on the page, as well as presentation style. Typically, either quotation marks or italic is used when mentioning words or expressions, rather than using them in normal use. For these, there are different options in HTML. Quotation marks are best written as such, using proper characters as per language rules, though some people still think that q markup is useful. For italic, you can use i markup or CSS font-style: italic.
In any case, if it is relevant to your purposes somehow that translations are marked up, e.g. in order to style them uniformly later, the best shot is to use classes.
The use of lang markup is recommendable in principle, and it is gaining some practical importance (e.g., for automatic hyphenation). In the following example, the span markup is used only to indicate the language (because you need an element for that):
The French word “<span lang=fr>cheval</span>” means “horse”.
How will you customise a html page so that it accepts multiple language?
I will cite W3 Internationalization Quick Tips for the Web :
Encoding. Use Unicode wherever possible for content, databases, etc. Always declare the encoding of content.
Escapes. Use characters rather than escapes (e.g. á á or á) whenever you can.
Language. Declare the language of documents and indicate internal language changes.
Presentation vs. content. Use style sheets for presentational information. Restrict markup to semantics.
Images, animations & examples. Check for translatability and inappropriate cultural bias.
Forms. Use an appropriate encoding on both form and server. Support local formats of names/addresses, times/dates, etc.
Text authoring. Use simple, concise text. Use care when composing sentences from multiple strings.
Navigation. On each page include clearly visible navigation to localized pages or sites, using the target language.
Right-to-left text. For XHTML, add dir="rtl" to the html tag. Only re-use it to change the base direction.
Check your work. Validate! Use techniques, tutorials, and articles at http://www.w3.org/International/
For more information follow W3 recommendations : http://www.w3.org/International/
One way to do this would be to use a decent server-side web technology, there are many to choose from, which contains support for internationalization. Essentially it comes down to specifying the different pieces of text that the site needs to display, assigning a label to each message, creating different versions of each label in separate language files, and using the server-side code, reference the label name and a country code to display the text in the appropriate language.
The first step is to determine your requirements, your hosting environment and then figure out what options are available to you. If you can provide some more information we might be able to steer you in a better direction.
If I make a bunch of assumptions about what you are trying to achieve:
Serve the document as UTF-8
Browsers will tend to then return a UTF-8 response to the server when any forms are submitted (forms being the only way that a page is going to "accept" anything), and UTF-8 can handle the characters used in just about every language.