We build bespoke WordPress themes, and recently have been receiving complaints regarding the sequence of headings. Most automated tools, including Google's Lighthouse, suggest that you should never skip heading levels, in order to properly communicate page structure for screen readers and other accessibility tools.
This issue is largely due to the way our clients enter content. They tend to prefer picking a visually pleasing heading, rather than the "correct" heading sequentially, so we'll often end up with pages that have an h1, then an h4, then a set of h2s, and so on. We've told these clients that they can fix this by properly entering content, but this seems to be asking too much of them, much like entering alt text for images.
To "solve" this issue, I'm trying to write a filter that will parse the_content, identify all of the headings, and replace their tags so that they become sequential, retaining classes for styling. I realize that this isn't a perfect solution, as the intended heading structure really can't be assumed programmatically, but this is the only viable solution I've been able to determine (if someone has a better idea, please, do tell).
So, for example, the code the user generates could be something like this:
<h2 class="title--h2">This is a second level heading</h2>
<p>Etiam vitae erat ullamcorper ipsum ultrices convallis ac quis nulla. Nam euismod imperdiet enim eu venenatis. Nulla non bibendum dui. Maecenas id tincidunt orci. Sed pellentesque ipsum et tempor convallis. Etiam elementum augue aliquet enim venenatis tincidunt. Praesent nunc dolor, vulputate nec aliquet consectetur, aliquet nec elit. Vivamus non eros nec nibh vestibulum lacinia. Morbi diam turpis, accumsan ac fringilla eget, fringilla vitae lorem. Ut consequat tortor orci, sed lobortis metus facilisis nec. Nulla sed enim in tortor blandit aliquet. Curabitur a finibus mi.</p>
<h4 class="title--h4">This is a fourth level heading</h4>
<p>Nullam blandit, mauris vel vestibulum aliquet, quam lectus laoreet mi, id euismod ligula augue sit amet velit. Suspendisse suscipit lacus quis mauris varius, sed cursus mi auctor. Nullam non augue in ante malesuada blandit. Nam eu purus commodo, porttitor odio commodo, tristique nunc. Suspendisse vitae vehicula turpis. Aenean turpis nibh, auctor ac mollis congue, iaculis id tortor. Morbi in est erat. Proin aliquam varius neque a sollicitudin. Vestibulum varius in urna sit amet hendrerit.</p>
<h4 class="title--h4">This is a fourth level heading</h4>
<p>Donec vitae est sapien. Nulla facilisi. Quisque sed auctor ante, sed viverra elit. Quisque justo arcu, vulputate tempor odio ac, mollis blandit justo. Morbi viverra tincidunt leo vel mattis. Aliquam erat volutpat. Nunc tortor tellus, porta sit amet tellus sed, interdum condimentum ex. </p>
And the output would be:
<h2 class="title--h2">This is a second level heading</h2>
<p>Etiam vitae erat ullamcorper ipsum ultrices convallis ac quis nulla. Nam euismod imperdiet enim eu venenatis. Nulla non bibendum dui. Maecenas id tincidunt orci. Sed pellentesque ipsum et tempor convallis. Etiam elementum augue aliquet enim venenatis tincidunt. Praesent nunc dolor, vulputate nec aliquet consectetur, aliquet nec elit. Vivamus non eros nec nibh vestibulum lacinia. Morbi diam turpis, accumsan ac fringilla eget, fringilla vitae lorem. Ut consequat tortor orci, sed lobortis metus facilisis nec. Nulla sed enim in tortor blandit aliquet. Curabitur a finibus mi.</p>
<h3 class="title--h4">This is a fourth level heading</h3>
<p>Nullam blandit, mauris vel vestibulum aliquet, quam lectus laoreet mi, id euismod ligula augue sit amet velit. Suspendisse suscipit lacus quis mauris varius, sed cursus mi auctor. Nullam non augue in ante malesuada blandit. Nam eu purus commodo, porttitor odio commodo, tristique nunc. Suspendisse vitae vehicula turpis. Aenean turpis nibh, auctor ac mollis congue, iaculis id tortor. Morbi in est erat. Proin aliquam varius neque a sollicitudin. Vestibulum varius in urna sit amet hendrerit.</p>
<h4 class="title--h4">This is a fourth level heading</h4>
<p>Donec vitae est sapien. Nulla facilisi. Quisque sed auctor ante, sed viverra elit. Quisque justo arcu, vulputate tempor odio ac, mollis blandit justo. Morbi viverra tincidunt leo vel mattis. Aliquam erat volutpat. Nunc tortor tellus, porta sit amet tellus sed, interdum condimentum ex. </p>
Again, I realize this is going to lead to unintended structure (I included an example of this in the above demonstration), but this is what my clients are asking for, so I'm giving in.
The code I have so far will track the previous heading level and determine what the new level should be, but I'm having difficulty understanding how to actually replace the tags correctly. My understanding is that modifying the DOM with $node->replaceChild() is going to result in items getting skipped, because the DOM is changing while its being parsed. Additionally, I'd like to retain all attributes on each heading, but I've been unable to locate a method for this; everything suggests copying individual attributes manually, but because this is CMS-driven, I'm worried that custom or unexpected attributes will be missed.
Here's the filter I have so far:
/**
* Ensure heading levels are always in sequence
*
* #param string $content
* #return string
*/
function namespace_fix_title_sequence(string $content): string {
if (! (is_admin() && ! wp_doing_ajax()) && $content) {
$DOM = new DOMDocument();
/**
* Use internal errors to get around HTML5 warnings
*/
libxml_use_internal_errors(true);
/**
* Load in the content, with proper encoding and an `<html>` wrapper required for parsing
*/
$DOM->loadHTML("<?xml encoding='utf-8' ?><html>{$content}</html>", LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
/**
* Clear errors to get around HTML5 warnings
*/
libxml_clear_errors();
/**
* Use XPath to query headings
*/
$XPath = new DOMXPath($DOM);
$headings = $XPath->query("//*[self::h1 or self::h2 or self::h3 or self::h4 or self::h5 or self::h6]");
/**
* Track previous heading level
*/
$previous_level = 1;
foreach ($headings as $heading) {
/**
* Get the current level
*/
$current_level = intval(preg_replace("/^h/", "", $heading->nodeName));
/**
* Determine the target level
*/
$target_level = ($current_level - $previous_level <= 1 ? $current_level : $previous_level + 1);
/**
* DEBUG
*/
echo "<p>Previous: {$previous_level}</p>";
echo "<p>Current: {$current_level}</p>";
echo "<p>Target: {$target_level}</p>";
echo "<hr />";
/**
* Replace current level with target level
*/
// ?
/**
* Update the previous level
*/
$previous_level = $target_level;
}
/**
* Save changes, remove unneeded tags
*/
$content = implode(array_map([$DOM->documentElement->ownerDocument, "saveHTML"], iterator_to_array($DOM->documentElement->childNodes)));
}
return $content;
}
add_filter("the_content", "namespace_fix_title_sequence", 100, 1);
The ideal unrealistic solution
In an ideal world, the best solution would be to totally prevent the content writer from selecting incorrect heading levels in the interface of their WYSIWYG.
As equally as you should maybe force them to put a non-empty alt text for images, a label for input fields, forbid empty links, etc.
Given some place in the document, they would only be allowed to put an heading of level 1 to N+1 where N is the level of the previous heading.
Consider that adjustments would also possibly have to be propagated, i.e. changing an H3 into an H2 in the middle of the text should also change all the following H4 into H3 down to the next H2, and so recursively.
This is, as you see, not as easy as we may think at first.
Sadly, not only it isn't that easy, neither to develop and to use, but anyway, writers are probably not ready for that. Those who don't understand the need for correct structuration will also probably qualify the restriction as a bug or a stupid software limitation against their freedom to write anything in the way they like.
Maybe you could decorelate heading level from the corresponding visual style to avoid frustration, but it's becoming quickly even more complicated.
So the only thing that you can do is educate content writers, or, just as you are proposing it here, trying to fix the incorrect structure automatically.
Algorithm to fix heading structure
Before getting more in the real taslk of DOM manipulation, let's talk a little about an algorithm. It's of course impossible to always fix the stucture in the way the author wanted it to be 100% of the time, but the goal is still trying to choose the most probable thing the author wanted to do.
IF we take your example back, the author wrote H2, H4, H3, H3. Is the simplest fix, H2, H3, H3, H3 the most appropriate?
What about H2, H3, H4, H4? Based on the fact that if two elements are visually different, it was probably intended that they are at different levels, and conversely, if two elements are visually identical, it was also probably intended that they are on the same level.
DOM maipulation
As far as I know, most DOM API I have ever seen in Java, JavaScript, PHP, C++, etc. effectively don't allow you to directly change the element name in place. You must create a new node to do that.
You can't simply change an H4 into an H3 while retaining the inner structure untouched for example.
So, if you indeed can't change the element name in place, you need to:
Create a fragment F with the inner structure of the H4 you want to change into H3.
If fragments are also unavailable or if extracting a fragment is complicated in the DOM API you are using, you have to clone child nodes one by one in an array.
Create the H3 node and put F into it
Copy attributes of the H4 into the H3. There is probably no other way than to make it one by one.
Replace the H4 by the new H3. Alternatively, if there is no replaceChild method, insert the H3 before the H4 and then remove the H4.
I have a document where there's a component name that's repeated hundreds of times. I'm trying to make this document into a template where the component name will change from report to report. Instead of having hundreds of iterations of "Example123" in plain text, I'd like to define a variable text string, for example "&ComponentName;" and use that throughout the template so that any change to that variable changes each instance of the component name. Thereby, creating a situation where anyone can create a new document for different components with one change instead of hundreds. Is something like that possible when just using HTML?
I've tried looking up every element in HTML in w3 schools to see if there's something like this, but to no avail. I've also tried searching stack overflow for this, but I think I might be using the wrong terminology and I'm not sure how else to describe what I'm looking for. When I think of "variable" I'm thinking of "X" which can be defined by the user, but when I look up "variable text in html" I tend to get results about <var> which doesn't help in this use case.
I tried to use
<script>
const string = "The revolution will not be televised.";
console.log(string);
</script>
to see if the text string will appear in output, but nothing appeared and I'm unfamiliar with Javascript.
Use some plachoder text inside your html, and then replace all the instances of that text with your component name, in javascript:
const placeholder = '#CMPNT#'
const componentName = 'MY COMPONENT NAME'
document.querySelector('.container').innerHTML = document.querySelector('.container').innerHTML.replaceAll(placeholder, componentName)
<div class="container">
<p>Lorem #CMPNT# dolor sit amet, consectetur adipiscing
elit. Vivamus ac scelerisque augue. Donec eu mattis libero.
Quisque gravida sit amet tellus id #CMPNT#. Orci varius
natoque penatibus et magnis dis parturient montes,
nascetur ridiculus mus. Suspendisse #CMPNT# et magna vel
congue. In non maximus diam. Suspendisse tristique est
vitae nibh sollicitudin varius. Integer at dolor vitae felis
placerat fringilla eu sit amet ipsum. #CMPNT# euismod ipsum
eget neque rhoncus sodales. Cras velit dui, tempus at pulvinar
eget, varius sed #CMPNT#. Donec egestas, erat nec luctus
suscipit, libero quam maximus mauris, id sagittis ligula
quam condimentum lacus.</p>
</div>
I’m using Sublime 3 to prepare HTML files that will eventually be turned into an epub in Sigil. This is working very well except that the formatting isn’t helping the readability.
I have HTMLbeautify and HTML/CSS/JSPrettify. They do a great job with the indentation but I would also like a method of putting the opening and closing paragraph tags on new lines, something like
<p>
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse rutrum dolor in lacus efficitur consequat. Cras turpis dolor, pretium sit amet tincidunt sed, porta iaculis lectus. Morbi consectetur vitae justo eu pretium.
</p>
Can anybody help?
I've read all the other Sublime/HTML formatting queries and i can't find anything that quite covers this.
Just select all lines (Ctrl A) and then from the menu select Edit → Line → Reindent. This will work
Is there a way to specify that a text must be in multiple column and column width is defined in percent?
something like :
<div style="width:20%; max-height:100px;" >Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam sodales urna non odio egestas tempor. Nunc vel vehicula ante. Etiam bibendum iaculis libero, eget molestie nisl pharetra in. In semper consequat est, eu porta velit mollis nec Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam sodales urna non odio egestas tempor. Nunc vel vehicula ante.
</div>
if the text overflow the div bound a new column is displayed.
I'd probably advise having a look at CSS3's Multi Column functionality if you don't have to support older browsers:
http://www.w3.org/TR/css3-multicol/
This is not currently supported using native HTML. Currently JavaScript must be used to obtain this feature.
See: http://www.htmlgoodies.com/html5/tutorials/how-to-create-multi-columns-in-css3-and-javascript.html#fbid=uRKNCpHfmWY
This is a JavaScript Solution:
I did an iPad WebApp in my last term at the uni which was supposed be a newspaper app. To get the newspaper like rows we used this jQuery Plugin:
http://archive.plugins.jquery.com/project/Columnizer
or: http://welcome.totheinter.net/columnizer-jquery-plugin/
you can specify the width of your columns, the amount of columns and has quite a few features which might be useful for your purpose. (but we didn't need them actually...)
How would you programmacially abbreviate XHTML to an arbitrary number of words without leaving unclosed or corrupted tags?
i.e.
<p>
Proin tristique dapibus neque. Nam eget purus sit amet leo
tincidunt accumsan.
</p>
<p>
Proin semper, orci at mattis blandit, augue justo blandit nulla.
<span>Quisque ante congue justo</span>, ultrices aliquet, mattis eget,
hendrerit, <em>justo</em>.
</p>
Abbreviated to 25 words would be:
<p>
Proin tristique dapibus neque. Nam eget purus sit amet leo
tincidunt accumsan.
</p>
<p>
Proin semper, orci at mattis blandit, augue justo blandit nulla.
<span>Quisque ante congue...</span>
</p>
Recurse through the DOM tree, keeping a word count variable up to date. When the word count exceeds your maximum word count, insert "..." and remove all following siblings of the current node, then, as you go back up through the recursion, remove all the following siblings of each of its ancestors.
You need to think of the XHTML as a hierarchy of elements and treat it as such. This is basically the way XML is meant to be treated. Then just go through the hierarchy recursively, adding the number of words together as you go. When you hit your limit throw everything else away.
I work mainly in PHP, and I would use the DOMDocument class in PHP to help me do this, you need to find something like that in your chosen language.
To make things clearer, here is the hierarchy for your sample:
- p
- Proin tristique dapibus neque. Nam eget purus sit amet leo
tincidunt accumsan.
- p
- Proin semper, orci at mattis blandit, augue justo blandit nulla.
- span
- Quisque ante congue justo
- , ultrices aliquet, mattis eget, hendrerit,
- em
- justo
- .
You hit the 25 word limit inside the span element, so you remove all remaining text within the span and add the ellipsis. All other child elements (both text and tags) can be discarded, and all subsequent elements can be discarded.
This should always leave you with valid markup as far as I can see, because you are treating it as a hierarchy and not just plain text, all closing tags that are required will still be there.
Of course if the XHTML you are dealing with is invalid to begin with, don't expect the output to be valid.
Sorry for the poor hierarchy example, couldn't work out how to nest lists.