Styling just comments inside a `pre` or `code` block with CSS - html

Is there a way to style comments inside a pre or code block (e.g. Ruby comments) using only CSS?
For example:
# I am a comment and should be lighter and italic
I = { :am => :normal_code, :and_want_no => :special_treatment }
I know you can use Javascript/jQuery to insert <span> elements in the right spots (like the <span>'s in the comment above provided by Stack Overflow) but can it be done with just CSS?
For background, I use a markdown renderer which outputs simple <pre> and <code> elements where necessary but without any hooks for indicating which language you're using and how to flag comments with <span> elements.

This task can't be done with just CSS.
CSS works at the element level and it is not possible to "select into" general text - even trivially, much less applying some rules to parse language grammar.
As noted, and as seen by inspecting the SO code rendering such as the one in this post, one approach is to output spans with the appropriate CSS classes (which are the result of separate grammar processing) - then these individual spans, which can selected, are styled.

a) What markdown renderer?
b) This can't be done with CSS with classes or ID's, as well as psuedo
elements
I will expand further as you do.
The problem is, you can't exactly render comments with your provided method, as these are technically never rendered in the first place.
comments are meant to be non-runnable code to help for debugging. Trying to add comments or manipulate comments would be a security breach and would require actually inserting a file into your appreciable code.
As far as that would go? That would be a tricky scenario unless you had the same comment or multiple files available to do so. I would say to just import your file if necessary with a duplicate version with a commented version.

Related

Remove all inline html attributes, but leave some

I'm trying to write an php function with preg_replace that removes all inline attributes of html elements, but wanted to leave some like 'href', 'title', 'alt'.
What I got until now is
([\w\-.:]+)\s*=\s*("[^"]*"|'[^']*'|[\w\-.:]+)
for marking all inline elements, but it still takes text like
href="test" Test
Without any html around it, additionally, this takes all inline attributes.
See my example text here:
[https://regex101.com/r/3OVaO2/1][1]
The goal is to remove any dangerous html elements.
I know that I have to handle something for the href-attribute in an extra function.
As already mentioned in the comments, Regex is not the way to go here.
That said: I have come up with this (https://regex101.com/r/3OVaO2/2)
(<\w+\s*[^>]*)\s(?!href|title|alt)[\w\-\d]+=(?:(['"]).*?\2|\w+)
However, this will only remove ONE evil attribute. The problem is, that with PCRE, you cannot have variable length lookbehind assertions. If you switch it to ECMAscript, you can do this (https://regex101.com/r/3OVaO2/3)
(?<=<\w+\s*[^>]*)\s(?!href|title|alt)[\w\-\d]+=(?:(['"]).*?\1|\w+)
This will probably do, what you want it to do. Nonetheless, this is NOT the holy grail for sanitizing HTML. Be careful with your output, if you don't consider your input safe.
Also, the definition of the tags may need some tweaking, since there may be tags like <some-element>, which are currently not detected by the regular expression.

Add html element that is "invisible" or skipped by CSS selector rules

I want to build an external GUI that operates on a generic HTML piece that comes with associated CSS. In order to enable some functionalities of the GUI, I would need to create some "meta" HTML elements to contain parts of content and associate them with data.
Example:
<div id="root">
<foo:meta data-source="document:1111" data-xref="...">
sometext
<p class="quote">...</p>
</foo:meta>
<p class="other">...</p>
</div>
This HTML is auto-generated starting from already existing HTML that has associated CSS:
<div id="root">
sometext
<p class="quote">...</p>
<p class="other">...</p>
</div>
#root>p {
color:green;
}
#root>p+p {
color:red;
}
The problem is, when adding the <foo:meta> element, this breaks CSS child and sibling selectors. I am looking for a way for the CSS selectors to keep working when encapsulating content in this way. We have tried foo\:meta{display:contents} style, but, although it works in terms of hiding the meta element from the box renderer, it doesn't hide it from the selector matcher. We do not produce the HTML/CSS to be processed, so writing them in a certain way before processing is not an option. They come as they are, generic HTML documents with associated CSS.
Is there a way to achieve what we are looking for using HTML/CSS?
To restate, we are looking for a way to dynamically encapsulate parts of content in non-visual elements without breaking child and sibling CSS selectors. The elements should only be available to DOM traversal such as document.getElementsByTagName('foo:meta')
If I understood your problem correctly.I would suggest using the space between the grandparent and the child instead of a '>'. Also your selector is an id and not a class.
The selector you have put in selects the next level child that is the children. But adding the space in between enables you to select grandchildren too!
so you have do is this
#root .quote {
color:green;
}
Let me know if this helped.
A working css is here
So, after much fiddling and research, we came to the conclusion that this can't be done, even with ShadowDom, as even that would require massive CSS rewrites that might not preserve semantics.
However, for anyone stumbling upon this question, we came to the same end by employing the following (I'll be short, pointers only):
using two comments to mark where the tag would start/end, instead of an XML tag (eg. <!--<foo:bar data-source="1111">-->...content...<!--</foo:bar>-->)
these pointers work more or less like the markup equivalent of a DOM Range and they can work together with it.
this approach has the interesting advantage (as opposed to a single node) that it can start and end in different nodes, so it can span subtrees.
But this also breaks the XML structure when you try to recompose it. Also it's quite easy by manipulation to end up with the range end moving before the range start, multiple ranges overlapping etc.
In order to recompose it (to send to a next XML processor or noSQL XML database for cross-referencing), we need to make sure we avoid the XML-breaking manipulations described above; then, one only needs to convert encapsulated tags to regular tags by using string manipulation on the document (X)HTML (innerHtml, outerHtml, XMLSerializer) to get a clean XML which can be mined and cross-referenced for content.
We used the TreeWalker API for document scanning of comments, you might need it, although scanning the document for comments this way can be slow (works for us though). If you are bolder you can try using xPath, ie. document.evaluate('//comment()',document), seems to work but we don't trust all browsers comply.

Apply CSS for empty element including space and comments [duplicate]

We have a selector :empty that can match an element when it is completely empty:
<p></p>
But I have a condition where it might be empty, or it might contain line breaks or blank spaces:
<p> </p>
I found a solution for Firefox, :-moz-only-whitespace:
:empty { width: 300px; height: 15px; background: blue; }
:-moz-only-whitespace { width: 300px; height: 15px; background: orange; }
<p></p>
<p> </p>
<p>This is paragraph three.</p>
Is there a similar solution for other browsers?
PS: In JSFiddle
Lots of people missing the point of this question, which I've addressed in the following exposition, but for those just looking for the answer, I'm mirroring the last paragraph here:
Selectors 4 now redefines :empty to include elements that contain only whitespace. This was originally proposed as a separate pseudo-class :blank but was recently retconned into :empty after it was determined that it was safe to do so without too many sites depending on the original behavior. Browsers will need to update their implementations of :empty in order to conform to Selectors 4. If you need to support older browsers, you will have to go through the hassle of marking elements containing only whitespace or pruning the whitespace before or after the fact.
While the question depicts a <p> element containing a handful of regular space characters, which seems like an oversight, it is far more common to see markup where elements contain only whitespace in the form of indentation and blank lines, such as:
<ul class="items">
<li class="item">
<div>
<!-- Some complex structure of elements -->
</div>
</li>
<li class="item">
</li> <!-- Empty, except for a single line break and
indentation preceding the end tag -->
</ul>
Some elements, like <li> in the above example as well as <p>, have optional end tags, which can cause unintended side effects in DOM processing as well in the presence of inter-element whitespace. For example, the following two <ul> elements don't produce equivalent node trees, in particular the first one does not result in a li:empty in Selectors level 3:
li:empty::before { content: '(empty)'; font-style: italic; color: #999; }
<ul>
<li>
</ul>
<ul>
<li></li>
</ul>
Given that HTML considers inter-element whitespace to be transparent by design, it's not unreasonable to want to target such elements with CSS without having to resort to modifying the HTML or the application generating it (especially if you end up having to implement and test a special case just to do so). To that end, Selectors 4 now redefines :empty to include elements that contain only whitespace. This was originally proposed as a separate pseudo-class :blank but was recently retconned into :empty after it was determined that it was safe to do so without too many sites depending on the original behavior. Browsers will need to update their implementations of :empty in order to conform to Selectors 4. If you need to support older browsers, you will have to go through the hassle of marking elements containing only whitespace or pruning the whitespace before or after the fact.
#BoltClock provided a fantastic answer to this question, showing that this (currently, that is, working with CSS Specification 3) cannot be achieved by CSS alone.
#BoltClock mentioned that elements that are truly empty (which is a weird definition as explained) can be targeted by using the pseudo selector :empty. This pseudo selector is only available in CSS 3 and WILL NOT select elements that have only whitespace as content.
#BoltClock stated that the only way to clean up elements that have only whitespace as content is to fix the HTML, but that is not entirely correct. This can also be achieved through the implementation of Javascript.
KEEP IN MIND! The Javascript that I am offering to solve this issue may take a very long time to execute, so the best method is to clean up the raw HTML instead if possible. If that is not possible, then this may work as a solution, as long as you do not have too extensive of a DOM tree.
I'll walk through the steps of how to write the script yourself...
First of all, launch everything after page load.
This should be pretty obvious. You need to make sure that the DOM has fully loaded before running your script. Add an event listener for page load:
window.addEventListener("load", cleanUpMyDOM);
...and, of course, before that, create a function called cleanUpMyDOM. We will write the rest of our logic within this function.
Second, gather the elements that we are checking.
In our example we are going to check the entire DOM, but this is where our script can get VERY extensive and may make your page unresponsive. You may want to limit the amount of nodes you are iterating over.
We can grab the nodes in question by using the document.querySelectorAll. What's nice about this function is that it will level out the DOM tree and we won't have to recurse the children of each node.
var nodes = document.querySelectorAll("*");
As I said earlier, this code will grab EVERY DOM node, and that is probably NOT a good idea.
For example, I am working with WordPress, and some of the internal pages have some junk in them. Luckily, they are all p elements that are children of a div.body element, so I can change my selector to document.querySelectorAll("div.body p"), which will select only p elements that are children of my div.body element recursively. This will greatly optimize my script.
Third, iterate the nodes and find the empty ones.
We'll create a loop for the nodes array and check each node in it. We will then have to check to see if the node is empty. If it is empty, we'll apply a class to it called blank.
I'm just shooting from the hip here, so if you notice a mistake in this code, please let me know.
for(var i = 0; i < nodes.length; i++){
nodes[i].innerHTML = nodes[i].innerHTML.trim();
if(!nodes[i].innerHTML)
nodes[i].className += " blank";
}
I am sure that there is a cleaner way to write the loop above, but this should get the job done.
Lastly, all you need to do is target the blank elements with your CSS.
Add this rule to your stylesheet:
.blank {
display:none;
}
And there you have it! All of your "blank" nodes have been hidden.
For those who just want to jump ahead, here is the finished script:
function cleanUpMyDOM(){
var nodes = document.querySelectorAll("*");
for(var i = 0; i < nodes.length; i++){
nodes[i].innerHTML = nodes[i].innerHTML.trim();
if(!nodes[i].innerHTML)
nodes[i].className += " blank";
}
}
window.addEventListener("load", cleanUpMyDOM);
Once again, if you notice any issues with my code, please let me know in the comments below.
Hope this helps!
P.S. Many people may be wondering why you would want to do this, as it does feel like bad practice. I would avoid doing this, but I am in a situation where I am starting to consider it. The content of the pages on my site are created through a WYSIWYG editor. This content is created and modified constantly by the marketing team and I get pretty overwhelmed handling the support for their slip-ups. Its not my job to fix WordPress's WYSIWYG editor (nor would I ever want to), but I could write a very simple script that can handle some of the work for me. That definitely seems like the better answer to me, besides training the support team on managing their whitespace when making edits.
For anyone looking at the exact link: https://drafts.csswg.org/selectors-4/Overview.bs#the-empty-pseudo
And TL;DR:
Note: In Level 2 and Level 3 of Selectors, '':empty'' did not match
elements that contained only white space. This was changed so that
that-- given white space is largely collapsible in HTML and is
therefore used for source code formatting, and especially because
elements with omitted end tags are likely to absorb such white space
into their DOM text contents-- elements which authors perceive of as
empty can be selected by this selector, as they expect.
And :empty will consider spaces as empty from v4 onwards:
content nodes (such as [[DOM]] text nodes, and entity references) whose data has a non-zero length must be considered as affecting emptiness; comments, processing instructions, and other nodes must not affect whether an element is considered empty or not.

CSS selector for empty or whitespace

We have a selector :empty that can match an element when it is completely empty:
<p></p>
But I have a condition where it might be empty, or it might contain line breaks or blank spaces:
<p> </p>
I found a solution for Firefox, :-moz-only-whitespace:
:empty { width: 300px; height: 15px; background: blue; }
:-moz-only-whitespace { width: 300px; height: 15px; background: orange; }
<p></p>
<p> </p>
<p>This is paragraph three.</p>
Is there a similar solution for other browsers?
PS: In JSFiddle
Lots of people missing the point of this question, which I've addressed in the following exposition, but for those just looking for the answer, I'm mirroring the last paragraph here:
Selectors 4 now redefines :empty to include elements that contain only whitespace. This was originally proposed as a separate pseudo-class :blank but was recently retconned into :empty after it was determined that it was safe to do so without too many sites depending on the original behavior. Browsers will need to update their implementations of :empty in order to conform to Selectors 4. If you need to support older browsers, you will have to go through the hassle of marking elements containing only whitespace or pruning the whitespace before or after the fact.
While the question depicts a <p> element containing a handful of regular space characters, which seems like an oversight, it is far more common to see markup where elements contain only whitespace in the form of indentation and blank lines, such as:
<ul class="items">
<li class="item">
<div>
<!-- Some complex structure of elements -->
</div>
</li>
<li class="item">
</li> <!-- Empty, except for a single line break and
indentation preceding the end tag -->
</ul>
Some elements, like <li> in the above example as well as <p>, have optional end tags, which can cause unintended side effects in DOM processing as well in the presence of inter-element whitespace. For example, the following two <ul> elements don't produce equivalent node trees, in particular the first one does not result in a li:empty in Selectors level 3:
li:empty::before { content: '(empty)'; font-style: italic; color: #999; }
<ul>
<li>
</ul>
<ul>
<li></li>
</ul>
Given that HTML considers inter-element whitespace to be transparent by design, it's not unreasonable to want to target such elements with CSS without having to resort to modifying the HTML or the application generating it (especially if you end up having to implement and test a special case just to do so). To that end, Selectors 4 now redefines :empty to include elements that contain only whitespace. This was originally proposed as a separate pseudo-class :blank but was recently retconned into :empty after it was determined that it was safe to do so without too many sites depending on the original behavior. Browsers will need to update their implementations of :empty in order to conform to Selectors 4. If you need to support older browsers, you will have to go through the hassle of marking elements containing only whitespace or pruning the whitespace before or after the fact.
#BoltClock provided a fantastic answer to this question, showing that this (currently, that is, working with CSS Specification 3) cannot be achieved by CSS alone.
#BoltClock mentioned that elements that are truly empty (which is a weird definition as explained) can be targeted by using the pseudo selector :empty. This pseudo selector is only available in CSS 3 and WILL NOT select elements that have only whitespace as content.
#BoltClock stated that the only way to clean up elements that have only whitespace as content is to fix the HTML, but that is not entirely correct. This can also be achieved through the implementation of Javascript.
KEEP IN MIND! The Javascript that I am offering to solve this issue may take a very long time to execute, so the best method is to clean up the raw HTML instead if possible. If that is not possible, then this may work as a solution, as long as you do not have too extensive of a DOM tree.
I'll walk through the steps of how to write the script yourself...
First of all, launch everything after page load.
This should be pretty obvious. You need to make sure that the DOM has fully loaded before running your script. Add an event listener for page load:
window.addEventListener("load", cleanUpMyDOM);
...and, of course, before that, create a function called cleanUpMyDOM. We will write the rest of our logic within this function.
Second, gather the elements that we are checking.
In our example we are going to check the entire DOM, but this is where our script can get VERY extensive and may make your page unresponsive. You may want to limit the amount of nodes you are iterating over.
We can grab the nodes in question by using the document.querySelectorAll. What's nice about this function is that it will level out the DOM tree and we won't have to recurse the children of each node.
var nodes = document.querySelectorAll("*");
As I said earlier, this code will grab EVERY DOM node, and that is probably NOT a good idea.
For example, I am working with WordPress, and some of the internal pages have some junk in them. Luckily, they are all p elements that are children of a div.body element, so I can change my selector to document.querySelectorAll("div.body p"), which will select only p elements that are children of my div.body element recursively. This will greatly optimize my script.
Third, iterate the nodes and find the empty ones.
We'll create a loop for the nodes array and check each node in it. We will then have to check to see if the node is empty. If it is empty, we'll apply a class to it called blank.
I'm just shooting from the hip here, so if you notice a mistake in this code, please let me know.
for(var i = 0; i < nodes.length; i++){
nodes[i].innerHTML = nodes[i].innerHTML.trim();
if(!nodes[i].innerHTML)
nodes[i].className += " blank";
}
I am sure that there is a cleaner way to write the loop above, but this should get the job done.
Lastly, all you need to do is target the blank elements with your CSS.
Add this rule to your stylesheet:
.blank {
display:none;
}
And there you have it! All of your "blank" nodes have been hidden.
For those who just want to jump ahead, here is the finished script:
function cleanUpMyDOM(){
var nodes = document.querySelectorAll("*");
for(var i = 0; i < nodes.length; i++){
nodes[i].innerHTML = nodes[i].innerHTML.trim();
if(!nodes[i].innerHTML)
nodes[i].className += " blank";
}
}
window.addEventListener("load", cleanUpMyDOM);
Once again, if you notice any issues with my code, please let me know in the comments below.
Hope this helps!
P.S. Many people may be wondering why you would want to do this, as it does feel like bad practice. I would avoid doing this, but I am in a situation where I am starting to consider it. The content of the pages on my site are created through a WYSIWYG editor. This content is created and modified constantly by the marketing team and I get pretty overwhelmed handling the support for their slip-ups. Its not my job to fix WordPress's WYSIWYG editor (nor would I ever want to), but I could write a very simple script that can handle some of the work for me. That definitely seems like the better answer to me, besides training the support team on managing their whitespace when making edits.
For anyone looking at the exact link: https://drafts.csswg.org/selectors-4/Overview.bs#the-empty-pseudo
And TL;DR:
Note: In Level 2 and Level 3 of Selectors, '':empty'' did not match
elements that contained only white space. This was changed so that
that-- given white space is largely collapsible in HTML and is
therefore used for source code formatting, and especially because
elements with omitted end tags are likely to absorb such white space
into their DOM text contents-- elements which authors perceive of as
empty can be selected by this selector, as they expect.
And :empty will consider spaces as empty from v4 onwards:
content nodes (such as [[DOM]] text nodes, and entity references) whose data has a non-zero length must be considered as affecting emptiness; comments, processing instructions, and other nodes must not affect whether an element is considered empty or not.

How to express a page break semantically correct in HTML?

I'm editing books/articles in HTML. These texts were printed once and I scan them, convert them into an intermediate XML-Format and then I transform them into HTML (by XSLT). Because some of those texts are extinct from the market today and are only available through the major libraries I want to publish them in a way so that people could possibly cite them by referring to the page numbers in the original document. For this purpose my intermediate XML-format has an element that marks a page-break. Right now I'm working on the XML->HTML transformations and I'm wondering myself how to transform these page breaks in HTML. They should not appear in the final HTML by default (so a simple | doesn't fit) but I plan to wrap these documents with some lightweight JavaScript that will show the markers when needed. I thought about <span>s with a | in it that are hidden by default.
Is there a better, possibly 'semantic' way to this problem?
Page breaks are very much a thing of layout, and HTML isn't designed to describe layout, so you aren't going to find anything that is semantic for this within the language.
The best you can hope for is some sort of kludge.
Since a page break can occur in the middle of a paragraph, and <p> elements can contain only inline elements you can eliminate most of the options from the outset.
The two possibilities that suggest themselves to me are <span> and <a>. The former has no semantics, that latter is designed to be linked to (with a name attribute) or from (with an href attribute), and you could consider a page from an original document something that you might wish to link to.
No matter what element you use, I wouldn't include a marker in it and then hide it with CSS. That sort of presentational flag is something I would consider adding via :before in a stylesheet (combined with a descendent selector for a body class that can be toggled with JS since you want the toggle)
Alternatively, if you want to take a (very) broad view of the meaning of "HTML" you could consider the l element (from the defunct XHTML 2 drafts) and markup each line of the original document. Adding a class would indicate where a new page began (and you could use CSS counters and borders to clearly indicate each page and number it should you so wish). Pity the browser vendors refused to get behind a real semantic markup language and favoured HTML 5 instead.
Use a <div class="Page"> for each page, and have a stylesheet containing:
.Page {
page-break-after: always;
}
Maybe you can use an xml tag not parsed/interpreted by html like <pagebreak/>.
In this way viewing the html the tag will be not rendered but using jQuery or any other Javascript library, transform, when asked, these particular tags in standard or whatsoever visual mark.
I think this can be a semantic approach...