Remove all inline html attributes, but leave some - html

I'm trying to write an php function with preg_replace that removes all inline attributes of html elements, but wanted to leave some like 'href', 'title', 'alt'.
What I got until now is
([\w\-.:]+)\s*=\s*("[^"]*"|'[^']*'|[\w\-.:]+)
for marking all inline elements, but it still takes text like
href="test" Test
Without any html around it, additionally, this takes all inline attributes.
See my example text here:
[https://regex101.com/r/3OVaO2/1][1]
The goal is to remove any dangerous html elements.
I know that I have to handle something for the href-attribute in an extra function.

As already mentioned in the comments, Regex is not the way to go here.
That said: I have come up with this (https://regex101.com/r/3OVaO2/2)
(<\w+\s*[^>]*)\s(?!href|title|alt)[\w\-\d]+=(?:(['"]).*?\2|\w+)
However, this will only remove ONE evil attribute. The problem is, that with PCRE, you cannot have variable length lookbehind assertions. If you switch it to ECMAscript, you can do this (https://regex101.com/r/3OVaO2/3)
(?<=<\w+\s*[^>]*)\s(?!href|title|alt)[\w\-\d]+=(?:(['"]).*?\1|\w+)
This will probably do, what you want it to do. Nonetheless, this is NOT the holy grail for sanitizing HTML. Be careful with your output, if you don't consider your input safe.
Also, the definition of the tags may need some tweaking, since there may be tags like <some-element>, which are currently not detected by the regular expression.

Related

Putting HTML within a <p> without it becoming elements [duplicate]

I've got a js-function which takes string as a parameter and get it displayed in a div element. Such string may contain html tags.
How do I force JS display inner text in div-elements as html-text with html-tags. And, also, what is an adequate way to filter particular tags, i.e. apply certain tags for styling and just print others.
You just need to replace & and < (and optionally > if you like, but you don't have to) with their respective entities, using String#replace (spec, MDC) for instance.
And, also, what is an adequate way to filter particular tags, i.e. apply certain tags for styling and just print others.
To put directly user inserted HTML code is dangerous for XSS. You should use some tool to sanitize HTML code (here on StackOverflow, for example, you can use some HTML tags).
As posted in this question here on SO you can use this client-side sanitizer: http://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/plugin/html-sanitizer.js
On the other hand you may need to do this on the server-side, which one depends on your environment (ASP.NET? PHP?).

Is using custom element without actually defining dangerous?

Instead defining:
<div id="my-custom-element-101"></div>
I wrote:
<my-custom-element-101></my-custom-element-101>
But didn't go further to extend HTMLElement and define it. This way I get some enhanced readability and don't need to do any further coding.
Is there any potential downside to this practice?
There's no absolute downside for that, as soon as you use valid custom element notation (i.e. a name with an hyphen "-").
In this case it's just an unknown custom element.
Of course if someone else decide to define a custom element with the same name you could get into some troubles but if you own the entire code of the page it can't happen.
Also note that, in your example, your tag <my-custom-element-101> is seen as an inline element, not a block.

Multi-line regex replace tags inside tags?

I want to replace single-line b/h2/h3/h4/h5 tags inside blockquote tags, with h6 tags.
So I want this:
^<[b|h[2-5]]>([^\.]+)</[b|h[2-5]]>$
to be replaced with this:
<h6>\1</h6>
but only if it's within a blockquote tag, which is on different lines. I'm thinking the solution must involve a lookbehind for a closing blockquote tag AND a negative-lookbehind for an opening blockquote, but I'm not sure how to implement this.
Regular expressions are extremely bad for parsing arbitrary HTML, as many things can go wrong.
That being said: this demo may get you started.
This doesn't deal properly with edge cases.
<div><b>This thing</div></b>
will not parse properly.
If you know your input is well-formed and doesn't have too deep of nesting (a <b> within a <h2> within something else, for example), then it may work. But to parse HTML, you really need a DOM parser.
Now, this does not handle the "between blockquote tag" requirement, but with Javascript (if this is what you're using), this isn't a very simple task example. You have to essentially run the same process over and over to get all of the elements converted to h6.
If you were to use jQuery instead, you could do it much more safely: jsfiddle

Styling just comments inside a `pre` or `code` block with CSS

Is there a way to style comments inside a pre or code block (e.g. Ruby comments) using only CSS?
For example:
# I am a comment and should be lighter and italic
I = { :am => :normal_code, :and_want_no => :special_treatment }
I know you can use Javascript/jQuery to insert <span> elements in the right spots (like the <span>'s in the comment above provided by Stack Overflow) but can it be done with just CSS?
For background, I use a markdown renderer which outputs simple <pre> and <code> elements where necessary but without any hooks for indicating which language you're using and how to flag comments with <span> elements.
This task can't be done with just CSS.
CSS works at the element level and it is not possible to "select into" general text - even trivially, much less applying some rules to parse language grammar.
As noted, and as seen by inspecting the SO code rendering such as the one in this post, one approach is to output spans with the appropriate CSS classes (which are the result of separate grammar processing) - then these individual spans, which can selected, are styled.
a) What markdown renderer?
b) This can't be done with CSS with classes or ID's, as well as psuedo
elements
I will expand further as you do.
The problem is, you can't exactly render comments with your provided method, as these are technically never rendered in the first place.
comments are meant to be non-runnable code to help for debugging. Trying to add comments or manipulate comments would be a security breach and would require actually inserting a file into your appreciable code.
As far as that would go? That would be a tricky scenario unless you had the same comment or multiple files available to do so. I would say to just import your file if necessary with a duplicate version with a commented version.

Alternative to using span? [duplicate]

This question already has answers here:
What are alternatives to the span-element? [duplicate]
(3 answers)
Closed 9 years ago.
Here is an example.
<span id="s1">Hello</span>
<span id="s2">world</span>
<span id="s3">this</span>
<span id="s4">is</span>
<span id="s5">a</span>
<span id="s6">sentence.</span>
Basically, I have a script that separates words of a sentence into a span. Is there a better approach of doing this? Perhaps an alternative to span that I don't know about? I thought of using something like <u> because it is short, then removing default underlining. Also <p> wont work because it is a block element.
Any ideas?
For semantics reasons, I'd advise against using other elements, unless there's some real need for you to have shorter element names. <span>s are semantically neutral elements, so they'd be ideal for this situation.
This is probably exactly what you should be doing if you really need to style each word differently. It's a meaningless tag used to group inline elements (in this case, words).
The span element .. doesn't mean anything on its own ..
The span element is the only element in HTML that has been defined as not meaning anything as such, so it is the element to be used when you wish to make e.g. a word an element in order to manipulate it, without assigning any meaning to it.
However, an a element without an href attribute is also “semantically” empty and with no default rendering rules. Some people have used <a id=foo>...</a> instead <span id=foo>...</span>. However, some programs may process such an a element in some special way (as if it were link-like anyway), and people may write style sheets in a manner that expects all a elements to be links. So such use of a is risky with no benefit beyond shortness. It also makes the source code less legible, since such use is not common.
In practice, you could, up to a point, use a custom tag, like z (with document.createElement('z') in JavaScript to make old versions of IE treat it as styleable). Browsers would treat it as unknown element, letting you handle it in scripting and (with the caveat) in CSS. But imagine what happens if some future version of HTML, or HTML as recognized by some browser, contains an element with the name you selected, with some fancy meaning and effect (like “don’t display this element” or “blink this text”).
I would agree with the answers Nightfirecat and imjared posted. <span> is probably the best element to use in this case as it denotes a neutral inline element.
However, if you really had to stretch a hack, you could try <em> since you are emphasising each word in its own way.