Prevent custom (broken) html from breaking the rest of the page - html

The product I work on supports users providing custom descriptions in markdown format (this is new, previously they could only provide raw html). Unfortunately many users have been using this product for years and as a result there are many descriptions that consist of markup that "sort of works" or "works in IE8".
I don't particularly care if their descriptions don't render right because they are broken, what I am concerned about is that the rest of the page shouldn't be broken because of it.
Example broken code
<ul>
</li>
<li>foo</li>
<li>bar</li>
</li>
<!-- no closing ul -->
Things I have done to mitigate the effect
I remove the following tags: html head body style frameset frame iframe script markdown-rendered
Surround descriptions with <markdown-rendered> as a way to contain the code.
Even with these mitigations, code like the example above still "breaks out". For the above example, a large amount of markup after it shifts inside the ul. What else can I do to "contain" bad markup?

The moment you inject the invalid markup into the document, it's going to be parsed and repaired to the best of the browser's ability. I would suggest doing this beforehand, and injecting the result of this operation, rather than allowing this operation to potentially disrupt your pre-existing structure.
One way in which libraries and frameworks have done this in the past is to create a bit of temporary structure, assign the invalid markup as the innerHTML, and then read back out the innerHTML:
var markup = clean( "<ul></li><li>foo</li><li>bar</li></li>" );
console.log( markup ); // "<ul><li>foo</li><li>bar</li></ul>"
function clean ( invalid ) {
var container = document.createElement( "div" );
return ( container.innerHTML = invalid ), container.innerHTML;
}
When the markup is assigned, it will be parsed, repaired, and constructed into actual DOM objects. When we read back out the innerHTML, we'll get nice and clean code directly from the browser.

Related

How to Create an HTML Template?

Problem
I have a collection of images with linked captions on a page. I want them each to have identical HTML.
Typically, i copy and paste the HTML over and over for each item. The problem is, if i want to tweak the HTML, i have to do it for all of them. It's time-consuming, and there's risk of mistakes.
Quick and Dirty Templating
I'd like to write just one copy of the HTML, list the content items as plain text, and on page-render the HTML would get automatically repeated for each content-item.
HTML
<p><img src=IMAGE-URL>
<br>
<a target='_blank' href=LINK-URL>CAPTION</a></p>
Content List
IMAGE-URL, LINK-URL, CAPTION
/data/khang.jpg, https://khangssite.com, Khang Le
/data/sam.jpg, https://samssite.com, Sam Smith
/data/joy.jpg, https://joyssite.com, Joy Jones
/data/sue.jpg, https://suessite.com, Sue Sneed
/data/dog.jpg, https://dogssite.com, Brown Dog
/data/cat.jpg, https://catssite.com, Black Cat
Single Item
Ideally, i could put the plain-text content for a single item anywhere on a page, with some kind of identifier to indicate which HTML template to use (similar to classes with CSS).
TEMPLATE=MyTemplate1, IMAGE-URL=khang.jpg, LINK-URL=https://khangssite.com, CAPTION=Khang Le
Implementation
Templating systems are widely used, like Django and Smarty on the server side, and Mustache on the client side. This question seeks a simple, single-file template solution, without using external libs.
I want to achieve this without a framework, library, etc. I'd like to put the HTML and content-list in the same .html file.
Definitely no database. It should be quick and simple to set it up within a page, without installing or configuring additional services.
Ideally, i'd like to do this without javascript, but that's not a strict requirement. If there's javascript, it should be ignorant of the fieldnames. Ideally, very short and simple. No jquery please.
you mean Template literals (Template strings) ?
const arrData =
[ { img: '/data/khang.jpg', link: 'https://khangssite.com', txt: 'Khang Le' }
, { img: '/data/sam.jpg', link: 'https://samssite.com', txt: 'Sam Smith' }
, { img: '/data/joy.jpg', link: 'https://joyssite.com', txt: 'Joy Jones' }
, { img: '/data/sue.jpg', link: 'https://suessite.com', txt: 'Sue Sneed' }
, { img: '/data/dog.jpg', link: 'https://dogssite.com', txt: 'Brown Dog' }
, { img: '/data/cat.jpg', link: 'https://catssite.com', txt: 'Black Cat' }
]
const myObj = document.querySelector('#my-div')
arrData.forEach(({ img, link, txt }) =>
{
myObj.innerHTML += `
<p>
<img src="${img}">
<br>
<a target='_blank' href="${link}">${txt}</a>
</p>`
});
<div id="my-div"></div>
This answer is a complete solution. It's exciting to edit the HTML template in codepen and watch the layout of each copy change in real time -- similar to the experience of editing a CSS class and watching the live changes.
Here's the code, followed by explanation.
HTML
<span id="template-container"></span>
<div hidden id="template-data">
IMG,, LINK,, CAPTION
https://www.referenseo.com/wp-content/uploads/2019/03/image-attractive.jpg,, khangssite.com,, Khang Le
https://i.redd.it/jeuusd992wd41.jpg,, suessite.com,, Sue Sneed
https://picsum.photos/536/354,, catssite.com,, Black Cat
</div>
<template id="art-template">
<span class="art-item">
<p>
<a href="${LINK}" target="_blank">
<img src="${IMG}" alt="" />
<br>
${CAPTION}
</a>
</p>
</span>
</template>
Javascript
window.onload = function LoadTemplate() {
// get template data.
let sRawData = document.querySelector("#template-data").innerHTML.trim();
// load header and data into arrays
const headersEnd = sRawData.indexOf("\n");
const headers = sRawData.slice(0, headersEnd).split(",,");
const aRows = sRawData.slice(headersEnd).trim().split("\n");
const data = aRows.map((element) => {
return element.split(",,");
});
// grab template and container
const templateHtml = document.querySelector("template").innerHTML;
const container = document.querySelector("#template-container");
// make html for each record
data.forEach((row) => {
let workingCopy = templateHtml;
// load current record into template
headers.forEach((header, column) => {
let value = row[column].trim();
let placeholder = `\$\{${header.trim()}\}`;
workingCopy = workingCopy.replaceAll(placeholder, value);
});
// append template to page, and loop to next record
container.innerHTML += workingCopy;
});
};
New version on github:
https://github.com/johnaweiss/HTML-Micro-Templating
Requirement
As specified in the question, this solution is intended to optimize the coding experience on the HTML side. That's the whole point of any web templating. Therefore, the JS has to work a little harder to make life easier for the HTML programmer.
The question seeks a reusable solution. Therefore, JS should be ignorant of the template, fields, and data-list. So unlike #MisterJojo's answer, the template and all data are in my HTML, not javascript. The JS code is generic.
Design
My solution is based on the <template> tag, which is intended for precisely this usage. It has various advantages, like the template isn't displayed, processed, or validated by the browser, so it has less impact on performance. Programmer doesn't have to write an explicit display:none style.
https://news.ycombinator.com/item?id=33089975
However, <template> tags are normally only intended for loading content into the layout. That's inadequate. This tool allows template variables anywhere in the HTML, including inside the tags (eg attributes like <img src).
HTML
My HTML has three blocks:
template: The HTML coder develops their desired display-structure of the output, in real HTML (not plain text). Uses <template>
data: The list of records each of which should be rendered using the same template. Uses <span> with a HIDDEN attribute.
container: The place to display all the output blocks. Uses <span>.
Template
My sample template includes 3 placeholders for data:
${LINK}
${IMG}
${CAPTION}
But of course you can use any placeholders, any number of them. I use string-literal delimiting-style (although i'm not actually using them as string-literals -- i just borrowed the delimiter style.)
Data Element
The question specifies data should be stored in HTML. It should require minimal keystrokes.
I didn't want to redundantly retype the fieldnames on every row. I didn't use slotting, JSO, Jason, or XML syntax, because those are all verbose.
https://developer.mozilla.org/en-US/docs/Web/Web_Components/Using_templates_and_slots
It's a simple delimited list. I eliminated all braces, brackets, equals, parens, colons etc.
I put the fieldname-headers only on the first row. The headers are a visual aid for the HTML developer, and a key for Javascript to know the fieldnames and order.
Record Delimiter: End-of-line
Field Delimiter: Double-commas. Seems safe, and they're easy to type. I don't expect to see double-commas in any actual data. Beware, the developer must enter a space for any empty cells, to prevent unintended double-commas. The programmer can easily use a different delimiter if they prefer, as long as they update the Javascript. You can use single-commas if you're sure there will be no embedded commas within a cell.
The data block is hidden using the hidden attribute. No CSS needed.
It's a span to ensure it takes up no room on the page.
JAVASCRIPT
Data
The data is processed by Javascript with two split statements, first on newline delimiter, then on the double-comma delimiter. That puts the whole thing into a 2D array. My JS uses trims to get rid of extra whitespace as needed.
Place-holder Substitution
Handling multiple entries requires plugging each entry into the template.
i went with simple string-replacement instead of string literals.
Multiple Templates
New version which supports multiple templates, and ability to use same template in multiple locations on same page.
https://github.com/johnaweiss/HTML-Micro-Templating
Future
Inspired by #MisterJojo, an earlier version of my solution used template literals to do the substitution. However, that was a bit more complicated and verbose, and seemed to require use of eval. So i switched to .replaceAll. Yet template-literals seems like a more appropriate method for templates, so maybe i'll revisit that.
A future version may adapt to whatever custom field-delimiter the HTML developer uses for the data block.
The dollar-curly delimiter for placeholders is a bit awkward to type. So i'm interested in finding a less awkward non-alpha delimiter that won't conflict with HTML. Considering double-brackets or braces [[NAME]]
Maybe there are simpler ways to pull the data-table into JS.
I've read components work well with <template>, but i didn't go there.
Imo, the JS committee should develop a variable-placeholder feature for <template> tags, and natively accommodate storing the data in HTML. It would be great if something like this solution was part of the rendering engine.

What are these HTML <c- g> tags? Undefined custom element?

When looking at the source code of the HTML standard there were some tags that I didn't recognise..
For example in this snippet:
<pre><code class='idl'>[<c- g>Exposed</c->=<c- n>Window</c->]
<c- b>interface</c-> <dfn id='htmlparagraphelement' data-dfn-type='interface'><c- g>HTMLParagraphElement</c-></dfn> : <a id='the-p-element:htmlelement' href='dom.html#htmlelement'><c- n>HTMLElement</c-></a> {
[<a id='the-p-element:htmlconstructor' href='dom.html#htmlconstructor'><c- g>HTMLConstructor</c-></a>] <c- g>constructor</c->();
// <a href='obsolete.html#HTMLParagraphElement-partial'>also has obsolete members</a>
};</code></pre>
From https://html.spec.whatwg.org/multipage/grouping-content.html
I thought these may be custom elements, but it doesn't look like they are defined via the custom element registry.. This is the result of interrogating the customElements object.
>>> customElements.get('c')
undefined
>>> customElements.get('c-')
undefined
Is this allowed? (I'd guess so since it's from the HTML standard, but it's still surprising to me). How would the browser know how these elements are supposed to be displayed? For example display: block vs. display: inline.
These are custom-elements (and valid HTML), generated by bikeshed's highlighter.
There is no need to define these as customElements because they don't bring any particular behavior, all they do is to ... save bandwidth.
Here is the commit excerpt:
🚨 TERRIBLE-HACK-ALERT 🚨 Switch to using <c- kt> instead of <span clas…
…s='kt'> to cut the weight of highlighting in half. Still valid HTML!
So apparently by switching from <span class="kt"> to <c- kt> (and span.kt { to c-[kt]{) they saved half of the weight induced by their highlighting.
Though as they say, it's a "terrible-hack", which still can make sense when building a tool that generates the majority of Web Standards pages, which can get very lengthy.
Regarding the default display of such custom-element, I'll quote Alohci's comment which did put it nicely:
All elements take the initial, or inherited for inherited properties, value of each CSS property until specified otherwise. So they would be display:inline
And regarding your expectation to see only best practices in the specs sources, it's better not assume so. Read the content of these pages, don't look at how they're built.
Most HTML editors don't look at the tools that will generate the pages, they write the specs in a pseudo-HTML language full of templates.
Or as it's put in the source:
<!-- Note: This file is NOT HTML, it's a proprietary language that is then post-processed into HTML. -->

Adding HTML to Word using OpenXML but unable to style the content (.Net Core)

I managed to add HTML (text only) to a Word-document following this post Add HTML String to OpenXML, using an already existing Word-file.
Unfortunately, I can't find any solution to use style from this Word-template for my newly added text. It is always "Times New Roman" size 12px although the standard style of the used template is "Arial" size 9px.
So fare I tried:
Using the ParagraphProperties as I would do for not HTML texts.
Paragraph para = body.AppendChild(new Paragraph());
Run run = para.AppendChild(new Run());
run.AppendChild(altChunk);
para.ParagraphProperties = new ParagraphProperties(new ParagraphStyleId() { Val = "berschrift2" });
Turnig MatchSource off
AltChunkProperties altChunkProperties = new AltChunkProperties();
altChunkProperties.MatchSource = new MatchSource() { Val = new OnOffValue(false) };
altChunk.AppendChild<AltChunkProperties>(altChunkProperties);
Any suggestions?
EDIT:
I found a workaround, which isnΒ΄t really a solution for my question, but works for me. I'm no longer trying to use the style from word, but adding the styles to my html before using altchunk.
Some explanation: if you look at the definition of altChunk in ISO 29500-1 17.17.2.1 and specifically in the A.1 section, the schema shows that altChunk is a EG_BlockLevelElts element and this is a peer with paragraphs (i.e. ). It is technically not correct to add as a child to run elements or even paragraph. It should be added at the body level. The fact that Word doesn't complain when adding as a run or paragraph child is unintentional and shouldn't be relied on.
As a result, what Word is doing is using the default style property for fonts to format this new content. You can try this by changing the document defaults in the styles.xml part. With match source property set to false, there isn't a way to specify the font besides document defaults.
Having said that, I think that Thomas' alternative is a better way to go.
The real solution for your question is to transform HTML into Open XML markup "yourself" rather than relying on the alternative format import parts in conjunction with w:altChunk elements. This creates a dependency on how Microsoft Word handles the import, often with little control on your side.
How do you transform HTML (or XML in general) to Open XML markup? The best way is to write so-called recursive pure functional transformations, which translate HTML elements and attributes to Open XML elements and attributes. If you have really simple HTML documents, that is not a big task. However, doing this for "arbitrary" HTML and CSS is quite a feat.
The good news is that the Open-XML-PowerTools, an Open Source library, contain functionality to transform HTML to Open XML and vice versa. Thus, I'd recommend you have a look at that library.
What worked for me and for my situation (if you don't want to go down the rather complex openxml powertools html converter root) is to add a HTML style attribute to the body section of your HTML fragment as follows:
Encoding.UTF8.GetBytes(
#$"<html><head><title></title></head><body style=""font-family: Calibri"">{ConvertUnconventionalUnicodeCharsToAscii(htmlAsString)}</body></html>");
It might be possible to dynamically derive the font family of the "normal" style embedded into the document you are updating and insert that name into the style attribute if deemed compatible.
That way, if you decide to change the base/ normal font the style of the HTML import will attempt to utilise the same font family.
Sorry if a bit off topic, I also could not get alternativeFormatImportPart.FeedData() to process "’" (code 8217) UTF-16 characters and so had to specifically replace them with "'" (code 39) in order to avoid them from being rendered as the following sequence Ò€ℒ

How to define HTML symbol for special char like Natural join: β‹ˆ

I use Markdown and HTML for my lecture notes, and when I need an unusual character like Natural join I have to use the unmemorable code β‹ˆ (β‹ˆ). Is there any way I can define a symbol, like &MYNATJOIN; in a CSS file (or wherever) that would be replaced with the β‹ˆ at HTML rendering time?
ccp
You can use the character β€œβ‹ˆβ€ as such in HTML, provided that you use UTF-8 and declare it properly, as you should anyway; see my Guide to using special characters in HTML.
Alternatively, much less reliably, you can use the HTML5 character reference &bowtie;. It belongs to the added named references that are completely unnecessary and are not supported by any browser version older than 2011.
In order to define your own entitiy that you could use as &MYNATJOIN;, you would need to serve your document with an XML content type, which means that old versions of IE will choke on it and that it will be processed in Draconian mode (i.e., any violation of XML well-formedness constraints will cause just an error message to be shown to users, no document content). Under these conditions, you can use XML entity declarations.
CSS is for optional presentational suggestions and should not be used to add significant content, due to the CSS caveats. If you would use β€œβ‹ˆβ€ for decorative purposes or to visually highlight something that is already duly emphasized verbally or in markup, you can add it to the rendering using generated content, e.g.
.funny:after { content: " β‹ˆ" }
in order to append a space and the β€œβ‹ˆβ€ character to the content of every element in class funny.
You can add a small javascript to the top of your document to do a global replace on your "user defined entity with the entity you want it to refer to. This function runs when the document is loaded.
JS (In <head> tag)
window.onload=function () {
document.body.innerHTML=document.body.innerHTML
.replace(/&MYNATJOIN;/gi,"β‹ˆ");
};
HTML (In <body> tag)
these are some notes. <br />
the entity &MYNATJOIN; should now be a bowtie
You can define more entites by adding more replace statements
See the code snippet below:
window.onload=function () {
console.log(document.body.innerHTML);
document.body.innerHTML=document.body.innerHTML.replace(/&MYNATJOIN;/gi,"β‹ˆ");
console.log(document.body.innerHTML);
document.body.innerHTML=document.body.innerHTML.replace(/&PLUSMINUS;/gi,"βˆ“");
console.log(document.body.innerHTML);
document.body.innerHTML=document.body.innerHTML.replace(/&SINEWAVE;/gi,"∿");
};
<body>
these are some notes.<br />
the entity &MYNATJOIN; should now be a bowtie <br />
a plus or minus looks like this &PLUSMINUS; <br />and how about a sine wave? &SINEWAVE;
</body>
Note that:
There are a litany of ways to trigger javascript to run when a document has loaded, but window.onload is simple and gets the job done.
The replacement uses a regular expression as that is a requirement for doing a global string replace in javascript.
Any & in an HTML document are implicitly converted to & by the HTML parser.
HTML
<span class='mynatjoin'><span/>
CSS
.mynatjoin:before{
content: "\22C8";
}
Result
β‹ˆ
JSfiddle
If you want it to be even simpler, and your willing to break your HTML validity, you could use tags, instead of classes like this:
HTML
<mynatjoin />
CSS
mynatjoin:before{
content: "\22C8";
}
Result
β‹ˆ
JSfiddle
I dont know if this will cause problems in some browsers, but I tested this in the latest, Chrome, FF an IE. It worked. Probably wont work in older browsers.
If you want to do it the way you specified i.e &MYNATJOIN;, then you will need to use some sort of javascript which scans the document and replaces &MYNATJOIN; with β‹ˆ. I don't think it is possible with pure html and css
Based on the example above, you can have multiple css classes to support your symbols. You can use this to find the css code for your corresponding symbol.

What do square brackets mean in html?

I am assisting on a project right now and building out templates for the first time, trying to wrap my head around a few things but one aspect of the html that's confusing me are certain things sitting in square brackets. I've never used these in html before so I'm just wondering what they are for (when I open the page in a browser they all show up as text)
Here's a bit of the code:
<div class="container">
[HASBREADCRUMBS]
<ol class="nav-breadcrumb">
[BREADCRUMBS]
</ol>
[/HASBREADCRUMBS]
<h1 class="header-title" style="color:[TITLECOLOR];font-size:[TITLESIZE];">[TITLE]</h1>
</div>
It's using some templating engine and the whole page is parsed before getting output to the browser. During parsing, those square bracket tags work as something else (depending on the templating engine used).
So, for example, [HASBREADCRUMBS] and [/HASBREADCRUMBS] could denote a piece of code that might be similar to:
if (breadcrumbs) {
and:
} // closed if
and for each value of the breadcrumbs object (whatever it might be) one ordered HTML list is rendered with the breadcrumb value as its content ([BREADCRUMBS]).
So in short: it's not HTML, that part of the file never reaches the browser but is converted into proper HTML (based on conditions, can also use loops, etc.) before rendering.
The square brackets have nothing to do with HTML. They probably belong to the template and will be replaced by actual value from the template engine.