How are tabs interpreted in CommonMark? - html

See the description before Example 6 in the CommonMark spec at: http://spec.commonmark.org/0.27/#example-5
I am trying to understand how the following code leads to a code-block starting with two spaces.
>→→foo
Example 6 shows that this would translate to the following.
<blockquote>
<pre><code> foo
</code></pre>
</blockquote>
But Section 2.2 clearly states:
However, in contexts where whitespace helps to define block structure, tabs behave as if they were replaced by spaces with a tab stop of 4 characters.
So as per my understanding, the above Markdown behaves like the following (I denote a space with a dot).
>........foo
Since, one optional space is allowed after >, and 4 spaces are used to indent code block, we are left with,
>...foo
That's a code-block starting with three spaces. How does CommonMark claim then that it should lead to a code-block starting with two spaces? What am I missing?

The key is in the very first paragraph of the Tabs section (emphasis added):
Tabs in lines are not expanded to spaces. However, in contexts where whitespace helps to define block structure, tabs behave as if they were replaced by spaces with a tab stop of 4 characters.
Notice that is says "4 characters" not 4 spaces.
If you configure your text editor to use a tab stop of length four and to replace tabs with spaces (any good text editor should offer this setting), the text editor will use columns that are four characters wide. When you press the tab key, it will forward the cursor to the next column, which will only every be four characters wide. If the column already contains any characters, then only as many spaces are added to total four characters, which, in this case would be less than four spaces.
For example, if you type an angle bracket (>) character in your editor and then press tab, you will get the following (when configured to replace tabs with spaces):
>···
Therefore the angle bracket plus the tab moves forward to the end of the column (four characters) for a total of three spaces. As we are now at the beginning of the next column, pressing tab a second time would move us to the next column (4 more spaces) for a total of 7 spaces:
>·······
We can confirm this is the correct interpretation with a more recent change to the spec committed in 3bc01c5dc (which apparently hasn't made it it to a release yet). As the commit comment suggests, the clarification helps the math make more sense (emphasis added):
Normally the > that begins a block quote may be followed
optionally by a space, which is not considered part of the
content. In the following case > is followed by a tab,
which is treated as if it were expanded into three spaces.
Since one of these spaces is considered part of the
delimiter, foo is considered to be indented six spaces
inside the block quote context, so we get an indented
code block starting with two spaces.
Notice the added sentence (in bold) which confirms that the first tab only adds "three spaces".
Therefore, as we have now established, we start with an angle bracket plus seven spaces. So first we break off the blockquote deliminator, which consists of the angle bracket and the first space (in the following examples the | is used to indicate where the parser breaks the string and should not be counted as characters):
>·|······
The text contained in the blockquote is now indented six spaces. Four of them are the code block deliminator:
>·|····|··
Which leaves two spaces at the start of the code block.
Of course, as stated back at the beginning (of the section in the spec), the tabs aren't actually replaced with spaces, it just behaves as if they were. And that can be confusing at times. It may help to configure your text editor to always replace tabs with spaces and then you can avoid this confusion.

Related

Does CommonMark spec allows leading spaces before the list marker?

Is this a valid list in CommonMark?
1. Foo
- Bar
- Baz
2. Qux
I am concerned about the validity of two leading spaces before each list marker, i.e. 1., 2., etc. Is it valid to provide leading spaces before the list marker?
I am unable to find anything in the spec that explicitly mentions that it is okay to have leading spaces before each list marker in the CommonMark spec at http://spec.commonmark.org/0.27/.
But there are many examples which seem to show leading spaces used before the list marker. For examples, see
http://spec.commonmark.org/0.27/#example-4
http://spec.commonmark.org/0.27/#example-9
But I would like the spec to clearly spell out that it is valid to put spaces before list markers. Can you find anything in the spec that clearly spells this out or at least implies this?
The specific rule is rule 4 of the list items section (which starts right after example 246):
Indentation. If a sequence of lines Ls constitutes a list item according to rule #1, #2, or #3, then the result of indenting each line of Ls by 1-3 spaces (the same for each line) also constitutes a list item with the same contents and attributes. If a line is empty, then it need not be indented.
Examples 247, 248 and 249 then show one, two and three spaces respectively, all if which are interpreted as list items. Finally, example 250 shows four spaces of indent resulting in a code block.
Of course the rule for indented blocks states (emphasis added):
An indented code block is composed of one or more indented chunks separated by blank lines. An indented chunk is a sequence of non-blank lines, each indented four or more spaces. The contents of the code block are the literal contents of the lines, including trailing line endings, minus four spaces of indentation.
Therefore, anything with less than four spaces of indentation is not a code block. A couple paragraphs later we find the following:
If there is any ambiguity between an interpretation of indentation as a code block and as indicating that material belongs to a list item, the list item interpretation takes precedence:
The example given shows a nested list item which is indented four or more spaces. However, that same example also indents the parent list item two spaces, so the rule could apply to both.
For comparison, the original Markdown rules explicitly stated:
List markers typically start at the left margin, but may be indented by up to three spaces.
This concept has existed in Markdown for many years.

HTML textarea cuts off beginning new lines

An HTML text area works fine with new lines ("\n") when they're after any other content in the text area, whether it be whitespace characters like spaces or tabs ("\t") or not.
However, when text area content begins with a new line (for example, "\ntest"), that new line gets cut off on display.
Any ideas on what causes this/how to remedy it?
This seems to be by the spec.
A single newline may be placed immediately after the start tag of pre and textarea elements. If the element's contents are intended to start with a newline, two consecutive newlines thus need to be included by the author.
Note that in the past there were some bugs in the various browsers regarding leading new lines in elements:
https://bugzilla.mozilla.org/show_bug.cgi?id=591988
https://bugs.chromium.org/p/chromium/issues/detail?id=62901

SSRS: prevent word break on plus and minus signs

I've got textbox in SSRS report. Textbox consists of 2 placeholders. Second one is long enough for line to be split several times. I want text to be wrapped on spaces, but it's wrapped on plus and minus signs instead. I need "a-b+" and "Ss-+" to be kept together.
Text is fetched from database, I have full control but can't predict exact length or particular order.
My guess is that engineers who've implemented wrapping thought of plus and minus signs as a part of math formula. That's wrong in my case.
So far I've tried to add HTML tags: makes each block occupy whole line and makes no effect. I need something like display: inline-block
I've tried creating several placeholders for each non-breaking value - no effect.
If I replace plus and minus signs with letters, placeholder wraps text just fine:
One obvious solution would be to calculate required character length to add manual line breaks (vbcrlf). But it can't be done easily since it's not a monospaced font.
Is it possible to prevent word wrapping on plus and minus signs?

How to replicate the function of the tilde character in LaTeX?

In LaTeX the tilde character is used set a space between two text elements that cannot be separated by a line break. This is useful to keep citations right next to the citing text, for instance. It is also most useful when presenting figures according to the ISU. As an example, the code:
1~400~t/a
Guarantees the following text output, always packed together in the same line:
1 400 t/a
Is there a way to mimic this behaviour in HTML?
A non-breaking space, in HTML, is expressed as

Microsoft Word / Outlook trims leading space

I send SQL to interested parties via Outlook as HTML, with Word as the editor.
I like to format my SQL using spaces, rather than tabs.
When I paste the SQL into the editor, formatting is spot on.
But the 'sent' version removes leading spaces.
For example:
Select
*
From
Employees
becomes
Select
*
From
Employees
Is there an option to prevent this?
I didn't find solution but found workaround - to replace all (and only) leading spaces to Nonbreaking Spaces. No needs to replace all ever spaces, only leading ones. This way Outlook will not trim them automatically while sending email.
Before send an email need to
select the text, you want to keep leading spaces, and run Find and Replace (Ctrl+H)
put into Find what: "^p " (caret, p, space) or click at the bottom [Special] button and choose Paragraph Mark and just a space character
put into Replace with: "^p^s" (caret, p, caret, s) or click at the bottom [Special] button and choose Paragraph Mark and Nonbreaking Space
now press [Replace All] - it will replace leading spaces with Nonbreaking Spaces only in selected text, leaving all the rest text unchanged.
Result with leading spaces:
First without leading spaces
One leading space in this row
No leading spaces again
One leading space in this row
Two leading spaces here
One leading space in this row
No leading spaces again
One leading space in this row
Two leading spaces here
Two leading spaces here
One leading space in this row
Two leading spaces here
Two leading spaces here
Three leading spaces here
With leading nonbreaking spaces:
First without leading spaces
One leading space in this row
No leading spaces again
One leading space in this row
Two leading spaces here
One leading space in this row
No leading spaces again
One leading space in this row
Two leading spaces here
Two leading spaces here
One leading space in this row
Two leading spaces here
Two leading spaces here
Three leading spaces here
Brief: Send e-mail as Plain Text to preserve whitespace.
After spending faaaar too much time trying (and failing) to override HTML paragraph styles, override list styles/characters, and tweak AutoFormat settings in Outlook 2013, the fallback solution may be to send any e-mails for which it is important to preserve whitespace as Plain Text.
One can send an e-mail as Plain Text by:
Selecting the Format Text ribbon tab.
Selecting Plain Text within the Format group.