Putting HTML within a <p> without it becoming elements [duplicate] - html

I've got a js-function which takes string as a parameter and get it displayed in a div element. Such string may contain html tags.
How do I force JS display inner text in div-elements as html-text with html-tags. And, also, what is an adequate way to filter particular tags, i.e. apply certain tags for styling and just print others.

You just need to replace & and < (and optionally > if you like, but you don't have to) with their respective entities, using String#replace (spec, MDC) for instance.

And, also, what is an adequate way to filter particular tags, i.e. apply certain tags for styling and just print others.
To put directly user inserted HTML code is dangerous for XSS. You should use some tool to sanitize HTML code (here on StackOverflow, for example, you can use some HTML tags).
As posted in this question here on SO you can use this client-side sanitizer: http://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/plugin/html-sanitizer.js
On the other hand you may need to do this on the server-side, which one depends on your environment (ASP.NET? PHP?).

Related

Remove all inline html attributes, but leave some

I'm trying to write an php function with preg_replace that removes all inline attributes of html elements, but wanted to leave some like 'href', 'title', 'alt'.
What I got until now is
([\w\-.:]+)\s*=\s*("[^"]*"|'[^']*'|[\w\-.:]+)
for marking all inline elements, but it still takes text like
href="test" Test
Without any html around it, additionally, this takes all inline attributes.
See my example text here:
[https://regex101.com/r/3OVaO2/1][1]
The goal is to remove any dangerous html elements.
I know that I have to handle something for the href-attribute in an extra function.
As already mentioned in the comments, Regex is not the way to go here.
That said: I have come up with this (https://regex101.com/r/3OVaO2/2)
(<\w+\s*[^>]*)\s(?!href|title|alt)[\w\-\d]+=(?:(['"]).*?\2|\w+)
However, this will only remove ONE evil attribute. The problem is, that with PCRE, you cannot have variable length lookbehind assertions. If you switch it to ECMAscript, you can do this (https://regex101.com/r/3OVaO2/3)
(?<=<\w+\s*[^>]*)\s(?!href|title|alt)[\w\-\d]+=(?:(['"]).*?\1|\w+)
This will probably do, what you want it to do. Nonetheless, this is NOT the holy grail for sanitizing HTML. Be careful with your output, if you don't consider your input safe.
Also, the definition of the tags may need some tweaking, since there may be tags like <some-element>, which are currently not detected by the regular expression.

Add html element that is "invisible" or skipped by CSS selector rules

I want to build an external GUI that operates on a generic HTML piece that comes with associated CSS. In order to enable some functionalities of the GUI, I would need to create some "meta" HTML elements to contain parts of content and associate them with data.
Example:
<div id="root">
<foo:meta data-source="document:1111" data-xref="...">
sometext
<p class="quote">...</p>
</foo:meta>
<p class="other">...</p>
</div>
This HTML is auto-generated starting from already existing HTML that has associated CSS:
<div id="root">
sometext
<p class="quote">...</p>
<p class="other">...</p>
</div>
#root>p {
color:green;
}
#root>p+p {
color:red;
}
The problem is, when adding the <foo:meta> element, this breaks CSS child and sibling selectors. I am looking for a way for the CSS selectors to keep working when encapsulating content in this way. We have tried foo\:meta{display:contents} style, but, although it works in terms of hiding the meta element from the box renderer, it doesn't hide it from the selector matcher. We do not produce the HTML/CSS to be processed, so writing them in a certain way before processing is not an option. They come as they are, generic HTML documents with associated CSS.
Is there a way to achieve what we are looking for using HTML/CSS?
To restate, we are looking for a way to dynamically encapsulate parts of content in non-visual elements without breaking child and sibling CSS selectors. The elements should only be available to DOM traversal such as document.getElementsByTagName('foo:meta')
If I understood your problem correctly.I would suggest using the space between the grandparent and the child instead of a '>'. Also your selector is an id and not a class.
The selector you have put in selects the next level child that is the children. But adding the space in between enables you to select grandchildren too!
so you have do is this
#root .quote {
color:green;
}
Let me know if this helped.
A working css is here
So, after much fiddling and research, we came to the conclusion that this can't be done, even with ShadowDom, as even that would require massive CSS rewrites that might not preserve semantics.
However, for anyone stumbling upon this question, we came to the same end by employing the following (I'll be short, pointers only):
using two comments to mark where the tag would start/end, instead of an XML tag (eg. <!--<foo:bar data-source="1111">-->...content...<!--</foo:bar>-->)
these pointers work more or less like the markup equivalent of a DOM Range and they can work together with it.
this approach has the interesting advantage (as opposed to a single node) that it can start and end in different nodes, so it can span subtrees.
But this also breaks the XML structure when you try to recompose it. Also it's quite easy by manipulation to end up with the range end moving before the range start, multiple ranges overlapping etc.
In order to recompose it (to send to a next XML processor or noSQL XML database for cross-referencing), we need to make sure we avoid the XML-breaking manipulations described above; then, one only needs to convert encapsulated tags to regular tags by using string manipulation on the document (X)HTML (innerHtml, outerHtml, XMLSerializer) to get a clean XML which can be mined and cross-referenced for content.
We used the TreeWalker API for document scanning of comments, you might need it, although scanning the document for comments this way can be slow (works for us though). If you are bolder you can try using xPath, ie. document.evaluate('//comment()',document), seems to work but we don't trust all browsers comply.

Styling just comments inside a `pre` or `code` block with CSS

Is there a way to style comments inside a pre or code block (e.g. Ruby comments) using only CSS?
For example:
# I am a comment and should be lighter and italic
I = { :am => :normal_code, :and_want_no => :special_treatment }
I know you can use Javascript/jQuery to insert <span> elements in the right spots (like the <span>'s in the comment above provided by Stack Overflow) but can it be done with just CSS?
For background, I use a markdown renderer which outputs simple <pre> and <code> elements where necessary but without any hooks for indicating which language you're using and how to flag comments with <span> elements.
This task can't be done with just CSS.
CSS works at the element level and it is not possible to "select into" general text - even trivially, much less applying some rules to parse language grammar.
As noted, and as seen by inspecting the SO code rendering such as the one in this post, one approach is to output spans with the appropriate CSS classes (which are the result of separate grammar processing) - then these individual spans, which can selected, are styled.
a) What markdown renderer?
b) This can't be done with CSS with classes or ID's, as well as psuedo
elements
I will expand further as you do.
The problem is, you can't exactly render comments with your provided method, as these are technically never rendered in the first place.
comments are meant to be non-runnable code to help for debugging. Trying to add comments or manipulate comments would be a security breach and would require actually inserting a file into your appreciable code.
As far as that would go? That would be a tricky scenario unless you had the same comment or multiple files available to do so. I would say to just import your file if necessary with a duplicate version with a commented version.

Is it okay to use an id for <p> tags?

My coworker is telling me I shouldn't use id's for paragraph tags...
I think it's the way to go if you know you're only using that kind of paragraph once on the page.
He also says that all elements on a page should only use class and not id, unless you are defining a header, container, or footer.
I am fresh out of college and I learned to use id for things that will only show once on a page, while using class for things that will show multiple times on a page.
Which way is proper?
I'd say it's fine as long as you know that the paragraph will only be used once on the page.
An example might be a piece of company info that you want to appear on multiple pages but be styled in a particular way. Giving that an id singles it out as unique and allows you to style it as such.
The class attribute should be used for styling a number of controls in a similar way (i.e. all those that belong to that class). For example, report totals might always need to be large and bold, so the encompassing tags would be given a reportTotals class. There might be more than one report and more than one total per page, but they should all look the same.
Yes, it's OK. Every DOM member can have an ID. There is nothing fundamentally wrong with a <p> tag using an ID.
Now, for applying CSS may be the case that is better for you using a class attribute instead of an id in <p> case because you might want to apply that style to several paragraphs. But it's just a matter of convenience.
From a semantic point of view, I would say that if the ids help split up the document structurally then using ids makes a lot of sense. For instance, you may have:
<p id="beginParagraph" ...> </p> and <p id="endParagraph" ...> </p>
that help easily identify and locate your beginning and ending paragraphs. Keep in mind that you should not have duplicate ids however, and the example above could easily get out of hand if you have many paragraphs and wanted to add an id for each.
Check out this article on Classes on Ids for other reasons why one might be better over another:
http://css-discuss.incutio.com/wiki/Classes_Vs_Ids
Use an id to refer to a specific element. Use class to refer to all elements of a specific type.
I think [you should use id on a paragraph tag] if you know you're only using that kind of paragraph once on the page.
You shouldn't use id just because you only happen have one of them on your page. You should use id when you want to be sure to only affect that one specific element, both now and in the future when other people add more paragraphs. If you have a specific "kind" of paragraph there is nothing wrong with using a class to represent this, even if that class currently only has one member.
I find the use of ids for HTML elements, specifically paragraph tags, is very useful when running automation tests using tools such as Selenium or in my current project that uses AngularJS, when running end-to-end tests using Protractor. It is much easier to make expectations by element ids than using class selectors.
Yes, it is perfectly okay to use the id element for the paragraph element. Just make sure you specify which paragraph if you are using more than one.

How to express a page break semantically correct in HTML?

I'm editing books/articles in HTML. These texts were printed once and I scan them, convert them into an intermediate XML-Format and then I transform them into HTML (by XSLT). Because some of those texts are extinct from the market today and are only available through the major libraries I want to publish them in a way so that people could possibly cite them by referring to the page numbers in the original document. For this purpose my intermediate XML-format has an element that marks a page-break. Right now I'm working on the XML->HTML transformations and I'm wondering myself how to transform these page breaks in HTML. They should not appear in the final HTML by default (so a simple | doesn't fit) but I plan to wrap these documents with some lightweight JavaScript that will show the markers when needed. I thought about <span>s with a | in it that are hidden by default.
Is there a better, possibly 'semantic' way to this problem?
Page breaks are very much a thing of layout, and HTML isn't designed to describe layout, so you aren't going to find anything that is semantic for this within the language.
The best you can hope for is some sort of kludge.
Since a page break can occur in the middle of a paragraph, and <p> elements can contain only inline elements you can eliminate most of the options from the outset.
The two possibilities that suggest themselves to me are <span> and <a>. The former has no semantics, that latter is designed to be linked to (with a name attribute) or from (with an href attribute), and you could consider a page from an original document something that you might wish to link to.
No matter what element you use, I wouldn't include a marker in it and then hide it with CSS. That sort of presentational flag is something I would consider adding via :before in a stylesheet (combined with a descendent selector for a body class that can be toggled with JS since you want the toggle)
Alternatively, if you want to take a (very) broad view of the meaning of "HTML" you could consider the l element (from the defunct XHTML 2 drafts) and markup each line of the original document. Adding a class would indicate where a new page began (and you could use CSS counters and borders to clearly indicate each page and number it should you so wish). Pity the browser vendors refused to get behind a real semantic markup language and favoured HTML 5 instead.
Use a <div class="Page"> for each page, and have a stylesheet containing:
.Page {
page-break-after: always;
}
Maybe you can use an xml tag not parsed/interpreted by html like <pagebreak/>.
In this way viewing the html the tag will be not rendered but using jQuery or any other Javascript library, transform, when asked, these particular tags in standard or whatsoever visual mark.
I think this can be a semantic approach...