text line breaks in markup between elements - html

I have a bug that results from actual line-breaks in my html markup. I'm trying to find documentation that supports what I'm seeing. This does not have to do with <br> or CSS, but instead the newlines within the actual markup. Basically, this:
<h4 class="condensed-title-wrap">
<span class="condensed-title">Here is the title text</span> <i class="icon-icon-navigation-mobile-arrow-right"></i>
</h4>
Renders differently than this:
<h4 class="condensed-title-wrap">
<span class="condensed-title">Here is the title text</span>
<i class="icon-icon-navigation-mobile-arrow-right"></i>
</h4>
What I'm asking for is language and maybe some documentation that this is true.

The invisible carriage returns and line feeds that occur at the end of the lines count as whitespace (different from non-breaking whitespace such as ). HTML requires that certain sequences of whitespace are condensed into a single space character. In your second example, the text breaks on the section just before which include the previous line break and leading spaces.
This is mentioned in several places in the spec, one of which is quoted here:
When a user agent is to strip and collapse whitespace in a string, it
must replace any sequence of one or more consecutive space characters
in that string with a single U+0020 SPACE character, and then strip
leading and trailing whitespace from that string.
This issue is often seen when trying to get a pair of inline-block formatted elements to place side-to-side, instead ending up with a (often 4 pixel) gap between the elements.
This code produces such a gap:
<div>
<span>left block</span>
<span>right block</span>
</div>
While this code does not:
<div>
<span>left block</span><span>
right block</span>
</div>
The reason for the gap is the whitespace between the end of one span and the beginning of the next. The solution moves the whitespace from between the span elements to the beginning of a string where it is ignored rather than condensed.

Related

How can I make leading or trailing spaces visible in a Rails HTML view?

I am displaying data that is being pulled one-way from another system and I would like to highlight (not trim) trailing and leading spaces so people can see the unwanted spaces and update the data in the source system.
I'm using Rails so the obvious solution is to use the ActionView::Helpers::TextHelper.highlight method but I had some trouble getting it to behave the way I want/expect:
<%= highlight(' Adam John ', /^\s*|\s*$/) %>
<!-- renders -->
<mark> </mark>Adam John<mark> </mark><mark></mark>
This generates the HTML I wanted but it turns out neither Blink nor WebKit mark leading or trailing spaces...
The same thing happens when I try to use <span> instead of <mark>... The output is correct but browsers don't display it.
<%= highlight(" Adam John ", /^\s*|\s*$/, highlighter: '<span class="bg-blue-700">\1</span>') %>
<!-- renders -->
<span class="bg-blue-700"> </span
Adam John
<span class="bg-blue-700"> </span>
<span class="bg-blue-700"></span>
Is there a better way to achieve this than using highlight? My end goal is to make it clear there's extra characters there... either highlight them yellow or have something a bit like text editors and word processors do when "Invisibles" are turned on... but only the leading and trailing invisibles:
While researching this question I came up with a solution (and a variation) that I am sharing here. If anyone has a better answer I will happily accept that instead.
Instead of trying to fight rendering engines, replace the unwanted space(s) with some other character. A subtle dot similar to what word processors and text editors often use can be achieved like this (utility classes from Tailwind CSS):
<%= highlight(" Adam John ", /^\s*|\s*$/, highlighter: '<mark class="bg-white text-gray-400">·</mark>') %>
...or highlight the unwanted spaces bright yellow by replacing them with non-breaking spaces (or adding an HTML entity to force it to display):
<%= highlight(" Adam John ", /^\s*|\s*$/, highlighter: '<mark> </mark>') %>
Side notes
I spent some time figuring out how <mark> works with different spaces. I experimented with invisible characters like ‍ and ­ as well as space and
<h6 class="font-medium text-sm text-gray-700">How <mark> handles spaces</h6>
<div class="mb-2 text-sm leading-5 text-gray-700">
<p><mark><mark> </mark>can<mark> </mark>highlight<mark> </mark>spaces<mark> </mark><strong>between</strong><mark> </mark>words...</p>
<p><mark> ...but regular leading and trailing spaces are ignored </mark></p>
<p><mark> </mark> <!-- this line doesn't even get displayed --> <mark> </mark></p>
<p><mark> Non-breaking leading and trailing spaces can be highlighted as part of longer string... </mark></p>
<p><mark> </mark>...or even on their own<mark> </mark></p>
<p><mark>‍ A zero width joiner outside a space means the space gets highlighted... ‍</mark></p>
<p><mark>‍ </mark>...even on their own:<mark> ‍</mark></p>
<p><mark>‍ </mark> <mark> ‍</mark></p>
<p><mark>­ A short hyphen is shy but outside a space means the space gets highlighted... ­</mark></p>
<p><mark>­ </mark>...even on their own:<mark> ­</mark></p>
<p><mark>­ </mark> <mark> ­</mark></p>
</div>

XPath for an element that follows some specific paragraph text nested in a div?

I'm trying to select the text "Part Sun, Sun" and "Herb", "Houseplant" from the html below.
The <div class="specifics"> has more of these "row" divs and the text I'm interested in always comes after certain paragraph tags containing specific text like "Light:", and "Type:" below.
Edit: To clarify out of all the "value" divs I'm only interested in ones that have specific "names". So I want to check the text of paragraphs nested inside <div class="name"> elements and if it's what I'm interested in then select the text inside the subsequent <div class="value"> element.
<div class="specifics">
<div class="row">
<div class="name">
<p>Light:</p>
</div>
<div class="value">
<p>Part Sun, Sun</p>
</div>
</div>
<div class="row">
<div class="name">
<p>Type:</p>
</div>
<div class="value">
<p>
Herb, Houseplant
</p>
</div>
</div>
...more rows...
</div>
I've tried this (using Scrapy):
trait = response.xpath("//div[#class='specifics']")
trait.xpath(".//div[#class='row']/div[#class='name']/p[text()='Light:']/../../div[#class='value']/p/text()[normalize-space()]")
The first line is ok but the second one is returning \n \n
Apologies for poor editing originally, below is what the paragraph element actually looks like.
Second Edit: There are a bunch of empty lines and when I select just /p without text() I still get back just a bunch of \n without any of the text? Tried normalize-space as above.
<p>
Part Sun,
Sun
</p>
To select the elements you need, you can do something like this:
/div[#class='specifics']/div[#class='row']/div[#class='value']/p
Adding /text() on the end will grab the Part Sun, Sun in your first row, but because your second row has additional nested elements in it, that text won't be picked up.
Instead you can use /string() which will also extract text from children. /div[#class='specifics']/div[#class='row']/div[#class='value']/p/string()
If you also need to strip out whitespace then you can use either normalize-whitespace() or translate(input, charsToReplace, replacement).
/div[#class='specifics']/div[#class='row']/div[#class='value']/p/normalize-space(string()). Using this tool I get output of String='Part Sun, Sun' and String='Herb, Houseplant'
/div[#class='specifics']/div[#class='row']/div[#class='value']/p/translate(string(), '
', '') where
is the newline character, but you could also add others characters you need removing. source

tachyons.io: emphasize a single word in italics

My problem
I am using the tachyons.io framework to style an HTML page. I would like to emphasize a single word in a sentence with italics fonts, without discontinuing the sentence flow by a line break.
The manual
The manual suggests using <p class="i"></p>, but using p adds a line break.
What have I tried
Apart from reading the manual, I tried using an internal p class="i" or class="i". The former broke the sentence with a new line, the latter did not change the word style.
My question
How can I emphasize a single word with italics in tachyons.io, without breaking the sentence with a newline?
Use a <span>: <span class="i"> ... </span>.
(or, alternatively, an <i>.)
Example:
<link href="https://cdnjs.cloudflare.com/ajax/libs/tachyons/4.9.1/tachyons.css" rel="stylesheet"/>
<p>
Only one <span class="i">word</span> is in Italics here.
</p>
<p>
Can also use <i>plain HTML</i> for that.
</p>

How can I apply a style to a span of text that transcends other elements' boundaries?

Let's suppose I have the following paragraphs:
<p>one two </p> <p> three </p><p> four five </p>
Now let's suppose I want to style the words two, three, and four green, in place, without having any other effect on the document's structure or other layout. I basically want a <span> that transcends block level elements like <p>s. How can I accomplish this most simply? I could
<p>o <span>t</span></p><p><span>t</span></p><p><span>f</span> f</p>
But that makes things really messy due to the fact that I employ a markdown parser and have my own custom preprocessing code. What could I do so that there's only one "style begin" mark, and only one "style end" mark per contiguous length of green text?
You can have your text wrapped in a single <p> </p> and have a <span> inside that wrapping around the text you want to style, so:
<p>one <span>two three four</span> five</p>
http://jsfiddle.net/asbd9rdj/
edit
To target specific words in your multiple <p></p> tags, use a <span></span> as an inline element so you can attach styles to it.
<p>one <span>two</span></p>
<p>three <span>four</span></p>
example here: http://jsfiddle.net/79be8L6L/
"Interleaving" HTML tags is invalid. You should use 3 separate <span> tags, like in your second example.
Making your HTML generator handle this is unfortunately a necessary complexity in order to produce proper HTML.

Should I use the <p /> tag in markup?

I have always used either a <br /> or a <div/> tag when something more advanced was necessary.
Is use of the <p/> tag still encouraged?
Modern HTML semantics are:
Use <p></p> to contain a paragraph of text in a document.
Use <br /> to indicate a line break inside a paragraph (i.e. a new line without the paragraph block margins or padding).
Use <div></div> to contain a piece of application UI that happens to have block layout.
Don't use <div /> or <p /> on their own. Those tags are meant to contain content. They appear to work as paragraph breaks only because when the browser sees them, and it "helpfully" closes the current block tag before opening the empty one.
A <p> tag wraps around something, unlike an <input/> tag, which is a singular item. Therefore, there isn't a reason to use a <p/> tag..
I've been told that im using <br /> when i should use <p /> instead. – maxp 49 secs ago
If you need to use <p> tags, I suggest wrapping the entire paragraph inside a <p> tag, which will give you a line break at the end of a paragraph. But I don't suggest just substituting something like <p/> for <br/>
<p> tags are for paragraphs and signifying the end of a paragraph. <br/> tags are for line breaks. If you need a new line then use a <br/> tag. If you need a new paragraph, then use a <p> tag.
Paragraph is a paragraph, and break is a break.
A <p> is like a regular Return in Microsoft Office Word.
A <br> is like a soft return, Shift + Return in Office Word.
The first one sets all paragraph settings/styles, and the second one barely breaks a line of text.
Yes, <p> elements are encouraged and won't get deprecated any time soon.
A <p> signifies a paragraph. It should be used only to wrap a paragraph of text.
It is more appropriate to use the <p> tag for this as opposed to <div>, because this is semantically correct and expected for things such as screen readers, etc.
Using <p /> has never been encouraged:
From XHTML HTML Compatibility Guidelines
C.3. Element Minimization and Empty Element Content
Given an empty instance of an element whose content model is not
EMPTY (for example, an empty title or
paragraph) do not use the minimized
form (e.g. use <p> </p> and not <p />).
From the HTML 4.01 Specification:
We discourage authors from using empty P elements. User agents should ignore empty P elements.
While they are syntactically correct, empty p elements serve no real purpose and should be avoided.
The HTML DTD does not prohibit you from using an empty <p> (a <p> element may contain PCDATA including the empty string), but it doesn't make much sense to have an empty paragraph.
Use it for what? All tags have their own little purpose in life, but no tag should be used for everything. Find out what you are trying to make, and then decide on what tag fits that idea best:
If it is a paragraph of text, or at least a few lines, then wrap it in <p></p>
If you need a line break between two lines of text, then use <br />
If you need to wrap many other elements in one element, then use the <div></div> tags.
The <p> tag defines a paragraph. There's no reason for an empty paragraph.
For any practical purpose, you don’t need to add the </p> into your markup. But if there is a string XHTML adheration requirement, then you would probably need to close all your markup tags, including <p>. Some XHTML analyzer would report this as an error.